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Studies in Industrial Empathy: 


III. A Study of Supervisory 


Empathy in the Textile Industry * 


Wendell M. 


Patton, Jr. 


Bruce Payne & Associates, Inc 


Summary 


The increasing difficulties and growing re- 
sponsibilities of the position of supervisor, 
particularly in the realm of human relations 
suggest the need for the investigation of the 
ability of supervisors to understand both 
their superiors and subordinates. With this 
in mind, a study was undertaken of the em- 
pathetic ability of these supervisors and the 
extent to which this ability was related to 
other psychological variables. 

Data were obtained from a large textile 
manufacturing plant producing prints and 
materials from spun rayon. The results are 
based on the replies of 54 secondhands or 
front-line supervisors, 18 members of top 
management, and a random sample of 243 
out of 2,496 employees. 

It was found that the secondhands were 
not empathizing effectively with either labor 
or management. Instead they were project- 
ing, positively toward labor and negatively 
toward management. A social-psychological 
gap existed between labor and management 
which the supervisors were unable to _per- 
ceive. Intelligence, education, and scores 
on the test, How Supervise? (5) were posi- 
tively related to the supervisors’ empathetic 
ability for both labor and management; age 
and supervisory experience were negatively 
related, while the particular shift and de- 
partment in which a supervisor was employed 
apparently had no effect on empathetic 

*This paper is based upon the writer’s doctoral 
research directed by Dr. H. H. Remmers, Purdue 
University. The dissertation, A Study of Certain 
Psychological Variables Related to Supervision in the 


Textile Industry, is on file in the Purdue University 
Library. 


ability. Empathetic ability was no greater 
among supervisors who were considered by 
management to be the best than those con- 
sidered by management to be the worst. In- 
tercorrelations between related variables and 
the supervisors’ empathy scores showed that 
the supervisors’ own attitudes and knowledge 
were the chief factors influencing projection. 
The findings indicated important individ- 
ual differences in empathetic ability and the 
possibility of predicting from a_ regression 
equation the supervisor’s ability to empa- 

thize with either labor or management. 
The 


Problem 


Today the American industrial system has 
become a house divided against itself. In 
industrial enterprise the supervisor is the 
direct connecting link between labor and 
management. The increasing difficulties, com- 
plications, and responsibilities of textile su- 
pervision have made it increasingly neces- 
sary to devote more effort to determining 
some of the psychological characteristics of 
good leadership and of the men now occupy- 
ing these positions. In the final analysis it 
is the supervisor who determines whether or 
not a given worker will keep his job or be 
fired or promoted. It is this supervisor who 
gives the orders and carries the directives of 
management to the workers. It is this same 
supervisor who has the only direct personal 
contact with the workers, and to these work- 
ers his actions and decisions are direct ex- 
pressions of company policy. Since efficient 
supervision demands a two-way channel of 
communication, it appeared likely that those 
individuals who have the ability to “put 
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themselves in the other fellow’s shoes” and 
anticipate their responses would best be able 
to carry the directives of management to 
labor and the needs and attitudes of labor 
to management. This ability (empathy) and 
its relation to other psychological variables 
of supervision constitute the basis of this 
study. 
Background 


Though the concept of empathy is of compara- 
tively recent origin, the possibilities of its value 
in various situations has not escaped the atten- 
tion of investigators. Remmers (8), for ex- 
ample, used this concept when he was called 
upon to develop an experimental design to test 
the procedures used to reduce the social-psycho- 
logical gap between labor and management in a 
large industrial organization. Davidoff (1) was 
concerned with the reciprocality of empathy be- 
tween Negroes and whites while Miller and Rem- 
mers (7) studied the psychological distance be- 
tween organized labor and management. Inter- 
est in the attitudes of labor leaders toward 
industrial supervision was shown by Remmers 
and Remmers (9) while Richards (10) meas- 
ured the empathetic ability of both labor and 
management. Travers (12), in a study of pre- 
dicting public opinion, found that individuals 
tend to overestimate the percentage of the group 
being judged who feel as they themselves feel. 
Projection of this nature which renders the em- 
pathetic process difficult is readily observable 
about us in relationships such as between parents 
and children, teachers and pupils, and others, 

The present study was attempted in part as a 
service project for the plant in which it was con- 
ducted and consequently leaves many facets un- 
touched. Even so, many avenues for additional 
research were uncovered. The development of 
some objective measure of the concept of em- 
pathy as well as the study of the effect of train- 
ing upon empathy would certainly be well worth 
while. If individuals can be taught to empathize 
more closely, the effect would far exceed the con- 
fines of an industrial situation. If this ability is 
not affected by training, then the problem be- 
comes one of selection. The reciprocality of 
empathy would also be an interesting area of 
study. It has been suggested that empathetic 
ability is reciprocal in nature so that it is easier 
to empathize with an individual who has high 
empathetic ability than one who has low ability 
(4). A knowledge of the many variables affect- 
ing empathy and the empathetic ability within 
an individual at different times would also add to 
our meager information on this subject. 

The results of these studies as well as Libo’s 
(6) and Dymond’s (2, 3, 4) suggest that em- 
pathetic ability is important for directing the 
work of others. 


Wendell M. Patton, Jr. 


Procedure 


Remmers (8) operationally defines empathy as 
‘. . . having the subject or subjects predict the 
ordinal or cardinal position of another individual 
or group on one or more scales of defined psy- 
chological dimensions.” The scale chosen for 
this study was How Supervise? (5). This test 
was administered to all front-line supervi- 
sors (secondhands), general foremen (overseers). 
members of top management and to a random 
sample of 10% of the employee group. It was 
also administered to the secondhands on two 
other occasions: once with instructions to an- 
swer each question as they believed management 
would answer it, and again with instructions to 
answer each question as they believed the em- 
ployees would answer it. For the purpose of 
this study the scoring consisted of counting the 
correct responses as determined by the answer 
key. The index of empathy was computed by 
determining the difference between the predicted 
scores for a given group and that same group’s 
actual mean score. 

These same supervisors were also administered 
The Adaptability Test (11) which was designed 
to yield a general measure of intelligence. In- 
formation such as age, sex, experience and edu- 
cation was obtained from the personnel records 
and a personal history blank which all supervi- 
sors completed. Since no suitable production 
records were available for a criterion of super- 
visory efficiency, ratings of supervisors by su- 
periors were used. Each supervisor was rated by 
at least three superiors and from these ratings a 
rank-order list was formed. Data from these 
sources served to test relevant hypotheses. 


‘ 


Results 


The extent to which textile supervisors, 
labor and management understand the psy- 
chologically best methods of supervision is 
shown in Table 1. The front-line super- 
visors scored higher than labor but manage- 
ment scored higher than either the super- 


Table 1 


Comparison of the Mean Scores of Labor, Front-line 
Supervisors and Management on 
How Supervise? 





Standard Standard 
Devia- _ Error of 
Mean tion Mean 


Group Number 


Labor 243 

Front-line 
Supervisors 54 

Management 18 





44.1 —) a 


48.1 8.5 
53.8 4.7 


1.1 
1.14 
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Table 2 


Comparison Between the Actual Social-Psychological 
Distance Between Management and Labor as 
Measured by How Supervise? and the 
Front-line Supervisors’ Pre- 
diction of this Distance 


Differ 
ence 


Manage- 
ment Labor 
53.84 44.14 9.70 
1.14 .66 1.31 
46.87 48.72 —1.85 
1.16 1.04 


Difference 6.97 . 44.58 se 
1.23 


Standard Error 1.62 





Actual Mean 
Standard Error 

Predicted Mean 
Standard Error 


visors or the workers. These differences were 
significant at the 1% level of confidence. 

Table 2 shows a comparison between the 
actual social-psychological distance between 
management and labor and the supervisors’ 
perception of this difference. The super- 
visors tended to overestimate the workers’ 
knowledge of the best methods of supervi- 
sion by a mean difference of 4.58. They also 
tended to underestimate management’s knowl- 
edge of the best methods of supervision by a 
mean difference of 6.97. Both of these dif- 
ferences were significant at the 1% level of 
confidence. The social-psychological distance 
is indicated by the difference of 11.55 which 
is also significant at the 1% level of con- 
fidence. From the data shown in Table 2 
it becomes obvious that the supervisors are 
not empathizing optimally with either man- 
agement or labor. Instead they are project- 
ing in both cases: positively toward labor and 
negatively toward management. 

A Pearson r of + .61 was found between 
the supervisors’ own scores on the test How 
Supervise? and the scores they predicted for 


labor. A Pearson r of + .44 was found be- 
tween the supervisors’ own scores and the 
scores they predicted for management. Both 
correlations were significant at the 1% level 
of confidence. This relationship very clearly 
implies that one important reason for the 
failure of supervisors to empathize is the 
projection of their own attitudes and knowl- 
edge to these other groups. 

Of the supervisory variables investigated, 
five appeared to be related to empathetic 
ability to such extent as to be considered of 
practical importance in the prediction of this 
ability. The intercorrelations of these five 
variables are shown in Table 3. The mul- 
tiple correlation between the variables shown 
and the supervisors’ predictions of manage- 
ment’s responses was indicated by R, .12345 
= + .53. The relationship between the same 
variables and the supervisors’ predictions of 
labor’s responses was shown by R, .12345 
= + .67. When these correlations are com- 
pared with the correlations for a single varia- 
ble of the supervisors’ own scores, it is evi- 
dent that this particular variable contributes 
the most toward the total relationship. 

A Pearson r of + .10 was found between 
the rank order of the supervisors as rated by 
their superiors and their predictions for labor. 
The correlation between this rank-order list 
and predictions for management was found to 
be + .09. Both of these correlations were 
too small to be of significance to this study. 
Consequently, it appears that the supervisors 
considered by management to be their best 
are able to empathize only about as well as 
those considered by management to be the 
poorest. Because of the small number of 
cases and the inherent weaknesses in the rank 
order, the results must be interpreted with 
caution. 


Table 3 


Intercorrelations of Five Variables Found"to be Related to the Supervisors’ Ability to Empathize 


Adaptability 


Scores 


Supervisory 


Age Experience Education 





. How Supervise? Scores 
. Adaptability Scores 

. Age 

. Supervisory Experience 


+.48 


—.29 — 2: +-.37 
— .44 —.2:! + .68 
+.7 — 45 

— 32 
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Conclusions 


In short, the findings indicate that the tex- 
tile supervisors are unable to empathize op- 
timally with either labor or management and 
that empathetic ability is related to certain 
psychological variables of supervision in the 
textile industry. The empathetic ability of 


a supervisor as here operationally defined 
can be predicted from a regression equation. 
The principal reason for the failure of the 
supervisors to understand management and 
labor is the projection of their own feelings, 
attitudes and knowledge upon these groups. 


Received October 7, 1953 
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Organization Control in Business 


L. R. Gaiennie 


Personnel Department, Fairbanks, Morse & Co., Chicago, Illinois 


It is becoming increasingly apparent that 
traditional organization charts, with their job 
titles and lines of authority, represent only 
one aspect of a business organization. They 
are two-dimensional still lives of a living in- 
stitution and can be likened to anatomy as 
contrasted with physiology. Anatomy stud- 
ies parts and organs of the body at rest, 
whereas physiology attempts to understand 
them in action. Organization charts fail to 
show actual relations between the jobs and 
people of an institution. This is because the 
various positions of a company are populated 
by human beings who are constantly acting 
and interacting to each other and to the 
changing conditions of business. Of course, 
such charts have their proper place in per- 
sonnel control. Scientific personnel work 
was founded upon analysis of the jobs and 
functions of an organization. It has become 
commonplace to point out that job descrip- 
tions are to the personnel executive what blue- 
prints and material specifications are to the 
engineer. Too often, however, the personnel 
executive accepts his organization merely be- 
cause it has been formalized by charts and de- 
scriptions, and proceeds to select and train 
employees to fill the positions thus created. 

Undue emphasis upon job relationships 
leads to an engineering-minded personnel ad- 
ministration. Such departments tend to treat 
people as a means rather than as an end in 
themselves. People are categorized as units 
of energy, not as people. The attempt to use 
_ people as a means rather than an end alien- 
ates them from a sense of belonging with 
management to the economy as we know it. 
Work likewise becomes a means—something 
foreign to a person’s real interests and goals; 
something with which to obtain an automo- 
bile or television set; something to be given 
sparingly as a cost rather than a good in it- 
self. One of the reasons why this is so lies 
in the fact that management, under the in- 
fluence of an atomistic engineering science, 
has broken down its job organization in such 


a way as to deprive employees of much of 
their creative relationship to work. The se- 
lection-minded approach to filling jobs so 
created has led to discouragingly meager re- 
sults over the last twenty years. 

It has become obvious that the personnel 
administrator’s functions must go beyond 
mere analysis and acceptance of his organiza- 
tion. After all is said and done, personnel 
efficiency is measured by the success of the 
company and the people in it. Today, per- 
sonnel administrators and psychologists are 
thus enlarging their field of interest—-upward 
from the skilled hourly workers and outward 
toward the relationship between the positions 
in a given organization. The larger concept 
of “organization control” is enriching the 
older field of “employee selection.” This 
growth and interest is indeed heartening and 
represents the growing maturity of both per- 
sonnel and industrial psychologists. A means 
of relating these two aspects of industrial or- 
ganization to each other is now needed if ade- 
quate organization control is to be achieved. 
If these two structures can be measured in 
similar terms, then some progress may be 
made, since progress in any field is largely 
dependent upon quantifying the data under 
consideration. 

To perform his function effectively, the 
personnel executive must take a critical look 
at both the “job structure” and the “people 
structure” of his company. Both functions 
and people within the company must com- 
bine to produce harmony and profits. Thus, 
business organization has at least two struc- 
tures: (1) the make-up and relationship be- 
tween the various positions of the company; 
and (2) the make-up and relationship be- 
tween the various persons occupying these 
positions. As has been pointed out, these two 
structures are merely separate aspects of the 
same problem. Complete understanding of 
each is dependent upon understanding the 
other, and the purpose of personnel control 
is to achieve proper job and personnel struc- 
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tures and assure a balance between them. 
Organization control is an attempt to meas- 
ure degrees of conformity between the job or- 
ganizational requirements, and the abilities 
and performance of job incumbents. Organi- 
zational efficiency is, in this respect, equivalent 
to the cost accountant’s measure of standard 
versus actual performance. In this sense, or- 
ganization control is designed to lead toward 
remedial or preventive action and as such lifts 
the older concept of employment personnel 
to a new and more dynamic level. 

For some time, various rating techniques 
have been used for measuring the relative 
complexity of jobs as a means of relating all 
jobs to one another. These techniques, when 
applied to management positions, are usually 
termed “position evaluation” and form the 
basis of most modern salary administration. 
Considerable work has been done to estab- 
lish the reliability of such data and, in gen- 
eral, it has been accepted by employees and 
management as the most rational and objec- 
tive basis so far developed for measuring the 
relative value of jobs. In order to achieve 
our objective of relating the two aspects of 
organization one to the other, evaluation ele- 


ments which can be applied with equal ease 
to both jobs and people must be used. Ex- 
amples of such elements are: planning ability, 
skill with people, job knowledge, quality of 


work, responsibility, and experience. All the 
elements (factors) are defined in 4 manner 
equally applicable to both the job and the 
job incumbent, as are the various grades 
within each element. 

By applying the usual rules of job evalua- 
tion to all of the management positions, data 
are obtained. After this has been done, each 
job incumbent is rated for the same elements 
against the ratings for the job he occupies. 
Evaluation of job incumbents is performed 
in a completely separate series of rating ses- 
sions. The same or different people may be 
used. The same principles are applied in 
evaluating people as are used in evaluating 
the jobs. 

The evaluating sessions for job incumbents 
differ from the position evaluation series only 
in that: (a) different evaluators may be used 
for the two sessions; (b) one series evaluates 
people and the other evaluates positions; and 


L. R. Gaiennie 


(c) the incumbent is rated against the re- 
quirements for the position he occupies. 

In the studies so far made, the data from 
the two evaluations have been entered on mas- 
ter cards. These measures are then treated 
statistically to arrive at total scores for each 
job and each job incumbent. Comparisons 
between positions and people can then be 
made and it can readily be seen if an em- 
ployee exceeds, equals, or is beneath the job 
requirements. These techniques are subject 
to all of the limitations and errors of any 
rating procedure. 

When similar cards have been filled out for 
all employees and positions, an almost in- 
finite number of comparisons can be made. 
Some of the most obvious are: 


1. Comparisons between jobs. ‘This can be 
used for such purposes as to obtain a better 
organization, increase or decrease the job 
content of certain positions, or to establish a 
salary structure. 

2. Comparisons between people. Since all 
the people have been evaluated on the same 
basis, direct comparisons can be made. 

3. Comparisons of jobs and people. This 
information can be used for training, organi- 
zation control, upgrading, and standardiza- 
tion of psychometric tests. 

This approach highlights the fact that re- 
duced variances between the requirements of 
the job and the abilities of job incumbents 
can be achieved in several ways; namely: 


1. By modifying job content. Where a 
discrepancy exists between the job and the 
incumbent it is possible to add or subtract 
duties and responsibilities. 

2. By changing personnel. Balance be- 
tween job and personnel can be brought about 
through training or transfer of personnel to 
achieve maximum use of company manpower. 

3. By changing both job content and per- 
sonnel. 

If the evaluation data for positions and peo- 
ple thus obtained are charted, the resulting 
series of curves demonstrate in quantitative 
terms the organization as a whole. That is to 
say, it now becomes possible to study both 
the groups of jobs and the groups of people, 
since all have been reduced to common de- 
nominators. Curves of this kind are de- 
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veloped by arranging the positions of the or- 
ganization in their order of complexity. It is 
then possible to plot both personnel and posi- 
tion ratings, keeping the data arranged in the 
order of the total position evaluations. 
Separate charts for each evaluation element 
are also possible. For purposes of the present 
discussion, the following terms are defined: 


1. Position Gradient—that curve obtained 
by listing the positions in ascending order of 
their total position evaluation scores along 
the abscissa and plotting their total or ele- 
ment position scores on the ordinate. 

2. Personnel Gradient—that curve obtained 
by listing the job incumbents in ascending 
order of their total position evaluation scores 
along the abscissa and plotting their total or 
element personnel evaluation scores on the 
ordinate. 

3. Positive Variance—any portion along a 
total or element evaluation curve where the 
abilities of the job incumbent are judged to 
exceed their job requirements. 

4. Negative Variance—any portion along a 
total or element evaluation curve where the 
job requirements exceed the abilities of the 
job incumbent. 


Hypotheses for Experimental Test 


Besides practical value to personnel execu- 
tives, this method of analysis, in spite of its 
limitations, has certain advantages to those 
interested in organizational theory because of 


its quantitative nature. An almost infinite 
number of relationships between the personnel 
and position data can be isolated and made 
the subject of more detailed study. Because 
there has been so little research in this field, 
the following suggestive questions and hy- 
potheses are stated as a means of stimulating 
further work on these and related organiza- 
tional problems: 


1. Can selective devices, such as standard- 
ized tests, be developed using job requirements 
as the criterion? ‘The typical test standard- 
ization process in industry has been to select a 
group of employees working on related jobs; 
to separate the good from the poor perform- 
ers and then to standardize the tests on this 
criterion. Two major weaknesses are in- 
herent in this approach: (1) the groups used 


in such a process tend to be small, thus re- 
ducing the reliability of the data; and (2) 
ability measures do not necessarily measure 
employee performance due to such factors as 
motivation. By using job evaluation ele- 
ments such as “planning ability” as the cri- 
terion, whole populations can be tested and 
used in standardizing tests. This approach 
allows for cutting scores by job type and 
separates out the motivational aspect from 
the testing devices for separate measures. 

2. Is it better to place individuals in posi- 
tions which just equal, exceed, or are less 
than their abilities? ‘This question strikes 
more directly at the problem of motivation 
and related problems. Using data obtained 
through this technique, it is possible to segre- 
gate separate populations from one or more 
organizations as follows: (a) greatly exceed 
job requirements; (b) exceed job require- 
ments; (c) equal job requirements; (d) be- 
neath job requirements; and (e) greatly be- 
neath job requirements. Having isolated the 
groups for study, various experimental de- 
signs can be used to ascertain their relative 
efficiency. If desired, it is also possible to 
segregate still other subpopulations within 
each one of the above groups. For example, 
those who exceed job requirements could be 
subdivided into those who equal, are beneath, 
or are above the job requirements on a par- 
ticular element. 

3. What are the effects of personnel re- 
versals upon organization performance and 
morale? There is evidence that some of the 
most uncooperative union stewards are an- 
tagonistic because they are more capable than 
the foreman over them. Applied to manage- 
ment organization, what are the practical re- 
sults of such a situation? Is it the same at 
all levels of an organization? ‘The question 
of reversals and their effect upon organization 
efficiency should probably be studied at three 
points on the curves: (1) those who repre- 
sent reversals; (2) their superiors; and (3) 
their subordinates. 

4. Is it possible to have an efficient organi- 
zation without positive position and personnel 
gradients? In recent years, there has been 
considerable discussion regarding «democracy 
in business, expressed by such terms as “bot- 
tom-up management.” Is this feasible as 
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applied to the two organizational structures 
herein discussed? The proponents of bottom- 
up management, to the extent that they would 
modify personnel or position gradients, have 
an opportunity to study various type gradi- 
ents and to report the relative efficiencies of 
each. 

5. Given a particular set of conditions, 
such as size of organization, type of activity, 
etc., are there particular gradients which re- 
turn optimal results? If positive answers can 
be given to this question, businesses might 
be spared thousands of dollars in cost as 
they establish or reorganize their operations. 
There is some evidence that certain general- 
izations may be discoverable. For example, 
the writer has heard competent business ex- 
ecutives express the following ideas, which are 
open to experimental test under the method 
outlined in this paper: 


a. Job-shop organizations require more 
complex position and personnel gradi- 
ents than production organizations. 

. Large organizations tend to develop or- 
ganization gradients significantly differ- 
ent from small organizations. 


c. “Mental” organizations or departments 
such as engineering, research, and de- 
velopment demonstrate flat gradients as 
compared to the typical manufacturing 
line organization. 


6. Are there optimal gradients which should 
be established as objectives if it is anticipated 
that a given organization is going to expand 
or contract? The organizational strains due 
to change are especially apparent during quick 
expansion or after continued long-term 
growth of a company. When this happens, 
previous methods and personnel must adjust 
to the new situation. The following hy- 
potheses are suggested as being subject to 
experimental verification: 


a. In a given business organization, if the 
job gradient expands and moves quickly 
upward on the ordinate, severe organi- 
zational strains will occur unless the per- 
sonnel gradient is caused to do likewise. 

. Organizational strains will ensue if the 
job gradient changes its relative shape 
while the personnel gradient remains the 
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same. Empirically speaking, it is prob- 
able that production control installations 
have a high mortality rate because they 
are frequently installed by outsiders who 
convince top management their system 
can be installed without disturbing ex- 
isting personnel. When this happens, 
the new system frequently creates severe 
organizational strain (modifying the job 
but not the personnel gradient; thus 
establishing negative variance) or else 
existing personnel through mass inertia 
finally defeat the new system. 


7. Given a _ particular organization, are 
there optimal organizational curves which re- 
late to particular company policies and pro- 
cedures? Observations to date indicate that 
such may be the case. For example, highly 
centralized multi-plant companies display dif- 
ferent organizational gradients than highly 
decentralized multiple plant operations. Cer- 
tain corporations which have centralized or 
decentralized their organizations in recent 
years are known to have created organiza- 
tional strains which might have been reduced 
if they had considered their existing gradi- 
ents. in relation to proposed objectives before 
starting their programs. 

8. Can training programs be made more 
realistic and be given to those employees who 
need assistance in the particular problem 
areas uncovered? Work so far indicates that 
much of the training time spent in industry 
is of a blunderbuss variety. Companies are 
too often prone to dangle a watch in front of 
bored employees in the hope that such ses- 
sions will somehow improve performance. 
This technique allows for selective training 
or transfer of employees based upon measures 
taken. In addition, it allows for post-train- 
ing measures to ascertain the relative effec- 
tiveness of such programs. 

The above questions and hypotheses are 
meant to be suggestive of the kinds of prob- 
lems which can be attacked experimentally 
through use of the position and personnel 
evaluation method. Further work on these 
and related problems is badly needed to 
establish factual guides for the business 
executive. 


Received October 7, 1953. 
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As ordinarily used, verbal evaluations in- 
cluded in performance reports furnish “im- 
pressions” and “qualitative observations.” 
Quantitative analysis of such comments can 
increase the usefulness of these reports and es- 
tablish a reliable basis for comparing individu- 
als. This paper describes the development 
and reliability of a procedure for scoring 
comments obtained from an efficiency report 
used to evaluate the job performance of 
commissioned officers in the United States 
Public Health Service. This work is part of 
the research program discussed by Newman 
(4). 

Method 


The quantitative method developed here 
for the analysis of verbal evaluations involved 
the adaptation of three well-known tech- 
niques. 

First, the method of content analysis sug- 
gested the classification of supervisors’ com- 
ments into categories. Content analysis has 
been employed extensively by Lasswell and 
his associates (3) in analyzing the political 
and propagandistic content of mass media. 

Second, the technique evolved by Thur- 
stone (7) for scaling attitudinal statements 
was used to assign values to comments classi- 
fied in each of the categories. Other investi- 
gators such as Uhrbrock (8) have applied the 
Thurstone technique to the scaling of state- 
ments concerning job performance and per- 
sonal characteristics. 

Third, methods like those introduced by 
Thorndike (6) for measuring the quality of 
handwriting were the basis for the use of a 
master scale for scoring each comment. 

The procedure used in establishing a sys- 
tem for scoring the comments in the efficiency 
report consisted of the following steps: 

1. A total of 779 comments were collected 
from the “remarks,” “handicaps,” and “rec- 

* The writer wishes to acknowledge with apprecia- 


tion the aid of Mrs. Jane S. Harris who was Scorer 1 
and who also did much of the statistical computation. 


ommendations’ sections of several hundred 
officer efficiency reports. A comment was 
defined as any word, phrase, or clause con- 
stituting a unitary evaluative description of 
the officer upon whom the report was pre- 
pared. 

2. By sorting and grouping all comments, 
it was possible to establish 12 descriptive 
categories relevant to officer characteristics 
deemed important to the Service and suf- 
ficiently independent of each other to allow 


Table 1 


Reliabifity Coefficients for Scale Placements 
in Each Category 
No. 


Category Items rat ruut 


General evaluation 69 20 99 
Potentiality for future 32 86 29 
Training and experience 30 87 99 
Relations with work 

associates 35 85 98 
Relations with official 

groups 2. 99 
Relations with patients 

and public 2 29 
Motivation 
Job proficiency ‘ 29 
Job progress k ‘ 99 
Potentiality as candidate 

for Regular Corps , 29 
Work attitudes d 29 
Intellectual qualities d 29 
Intellectual qualities 

(“duplicate’’ items 

removed) f 93 99 


* rit = Average of correlation of each judge’s place- 


ments with every other judge’s placements. 

t riu = Reliability of average of 11 judges. 

{ To test the hypothesis that the high correlations 
were produced by “duplicate” items, these were re- 
moved. “Duplicate” items were defined as those 
having: (a) the same adverb modifiers; and (b) scale 
placements differing by no more than one place. It 
may be seen that removing these items has very little 
effect on the correlation. 
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classification of comments. 
are listed in Table 1. 

3. The comments in each of the cate- 
gories were placed on a nine-point scale from 
“undesirable” (1-3), to “neutral” (4-6), to 
“desirable” (7-9) by 11 Public Health Serv- 
ice judges, most of whom were psychologists. 
Each comment was assigned a numerical 
value for use in scoring; this value was the 
median of the scale placements made by the 
different judges. 

4. A scoring manual was constructed by 
listing, in each of the 12 categories, the com- 
ments with their median values. In using 
the manual, the scorer identifies a comment 
and classifies it in one of the 12 categories. 
This process of identification and classifica- 
tion is defined here as coding. He then 
matches each comment as closely as pos- 
sible with one in its category in the manual 
and assigns it the listed numerical value. 
The numerical values of all comments in an 
efficiency report are averaged to obtain the 
raw score for the verbal evaluation parts of 
the report. In this article, the entire process 
of arriving at scores, involving both the cod- 
ing of comments and assigning of numerical 
values to them, is termed scoring. 

5. Reliabilities of the scale placements and 
the scoring methods were determined. 


These categories 


Results 


Reliability of Scale Placements. As ob- 
tained by the method of average intercorre- 
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lation, Peters and Van Voorhis (5), the re- 
liabilities of the scale placements made in 
each of the 12 categories by the 11 judges 
are shown in Table 1. 

For comparative purposes, split-half co- 
efficients based on correlations of the averages 
of five judges with those of five other judges 
and stepped up for 10 judges by the Spear- 
man-Brown formula were computed for four 
categories. The results for 10 judges were 
similar to those obtained by the method of 
average intercorrelation for 11 judges (see 
Table 1): training and experience, r, , = .95, 
Yo 19 = -97; relations with work associates, 
155 = .98, Tyo 1 = -99; relations with patients 
and public, 7, , = .99, 7, 49 = -99; proficiency, 
7.5 = 98, fio 19 = 99. 

Reliability of the Scoring Method. ‘The 
reliabilities of the comment scores assigned 
under various conditions are presented in 
Table 2. 

In one method of determining reliability, 
two people, one trained by the other, inde- 
pendently scored the comments in a sample 
of officer efficiency reports. The scores as- 
signed by the different scorers were correlated 
(Scorer 1 vs. Scorer 2). 

In the other method, scores assigned by the 
same scorer on two different occasions were 
correlated (Scorer 1 vs. Scorer 1, and Scorer 
2 vs. Scorer 2). Precautions were taken to 
minimize the effects of memory and other 
contaminating factors. During the period in- 
tervening between the two scorings, the scorer 


Table 2 


Reliability of Comment Scoring 





Total 
No. of 
Different 
Comments 
Coded in 
Both Scorings 


No. 
Efficiency 
Reports 
Scored 


Correlation 
Between Scores 
on Comments 
Coded the Same 
in Both Scorings 

Ti 


Comments Coded the 
Same in Both Scorings 


No. 





Per Cent 





Scorer 1* vs. Scorer 2 32 114 
Scorer 1 vs. Scorer 1 
(4 mo, interval) 
Scorer 2 vs. Scorer 2 
(4 mo. interval) 
Scorer 2 vs. Scorer 2 


(14 mo. interval) 


30 99 


39 141 


40 151 


99 87 86 


93 94 95 





* Scorer 1 was more experienced in scoring than Scorer 2. 
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worked on other efficiency reports and was 
not allowed to see those utilized in the stud- 
ies. The extent of agreement in coding com- 
ments in both of the scorings is also shown in 
Table 2 for all four studies. 

In the comparison of the scoring done by 
Scorer 2 on two occasions, separated by a 14 
month interval, the comment scores assigned 
were also averaged to obtain a single score 
for the verbal evaluation parts in each ef- 
ficiency report. The average scores assigned 
each of the 40 reports on the two different 
occasions correlated .94. 


Discussion 


A procedure for quantitatively analyzing 
verbal material for the comparison of officer 
performance has been developed. The place- 
ments of verbal comments on a nine-point 
scale, basic to the development of the scoring 
procedure, can be reliably achieved by a 
relatively small number of judges. In agree- 
ment with these findings, Uhrbrock (8) ob- 
tained high reliabilities for the Thurstone 
scale values of descriptive rating scale state- 
ments; Hinckley (2) and Ferguson (1) found 
that scale values assigned attitudinal state- 
ments by use of the Thurstone technique were 
highly reliable. 

Three aspects of the reliability of scoring 
by the use of the scoring manual were con- 
sidered: the correlation between scale values 
assigned the comments, the correlation be- 
tween average comment scores, and the per- 
cent of agreement in the coding of comments. 

The reliability of comment scores was 
found to be higher for scores assigned by the 
same scorer on two different occasions (.95 
in one study and .96 in the other) than for 
the scores assigned by two different scorers 
(.86): Increasing the lower coefficient might 
be accomplished by adding the occasional new 
‘comments to the large sample in the manual, 
and by increasing the similarity of judgmental 
standards through cooperative training and 
scoring. 

The reliability of average comment scores 
is analogous to the reliability of total scores 
on a test. In one of the studies (Scorer 2 vs. 
Scorer 2, separated by a 14 month interval), 
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it was found that the reliability of average 
comment scores (.94) was similar to the re- 
liability of individual comment scores (.96). 
This suggests that when the reliability of com- 
ment scores is found to be high, average com- 
ment scores will also be reliable. 

Agreement in the coding of comments may 
be considered fairly high, since in three out of 
the four studies the percentages of agreement 
were, respectively, 82, 87, and 94 per cent. 
In the remaining study, the percentage of 
agreement was 74 per cent, but this coding 
was done by the more inexperienced scorer, 
with a 14 month interval between the scor- 
ings. This interval probably represented a 
training period in which the scorer may have 
developed more ability to code comments. 
Agreement in the coding of comments might 
be increased, in general, by giving scorers ex- 
tensive training in this aspect of the method. 

The findings on reliability of the procedures 
used here for scoring comments in efficiency 
reports suggest that this method may be use- 
ful for analyzing quantitatively other kinds 
of verbal material. In occupational situa- 
tions, supervisors usually like to make verbal 
reports; this scoring procedure will allow 
quantitative utilization of material which has 
ordinarily been merely “taken into considera- 
tion.” It is also likely that these quantita- 
tive procedures will prove useful in such fields 
as propaganda analysis, the analysis of litera- 
ture, or the analysis of the verbal reports of 
patients or interviewees. Of course, it would 
be necessary to determine the reliability, 
validity, and other relevant characteristics of 
the scores obtained in any given situation. 


Summary and Conclusions 


A method for the quantitative analysis of 
verbal material is presented. Verbal material 
is quantified by categorizing each unit of ma- 
terial (each comment) and comparing it with 
the empirically derived master scoring scale 
constructed for that category. The findings 
show that: (a) the comments in each cate- 
gory can be reliably placed on a nine-point 
scale by a relatively small number of judges; 
and (b) scores, based on either the individual 
comments in each category or the average 
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comment scores for each report, are reliable. 
It is suggested that the procedures developed 
here can be utilized for other types of verbal 
material. 


Received October 26, 1953. 
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Standardization of the GATB for the Occupation of Tabulating 
Machine Operator ' 


Minnesota State Employment Service in Cooperation with the U. S. 
Employment Service, U. S. Department of Labor, 
Washington, D. C. 


This study is concerned with the predic- 
tion of success or failure in the occupation of 
Tabulating Machine Operator. It was con- 
ducted by the Minnesota State Employment 
Service in cooperation with the United States 
Employment Service (USES) and the Na- 
tional Machine Accountants 
(NMAA). 

The present study is an attempt to de- 
velop national norms for this occupation. It 
is an outgrowth of previous studies conducted 
in Florida, Ohio, and Minnesota, the latter 
being conducted in cooperation with the 
Northwest Chapter of NMAA and the Uni- 
versity of Minnesota. 


Association 


Procedure 


The Sample. The sample in this study is com- 
posed of 203 operators employed in four states, 
viz., California, North Carolina, New Jersey, and 
Wisconsin. Of the 203 operators, 96 are women 
and 107 are men. 

All types of operations listed by the Interna- 
tional Business Machines Company (IBM) and 
by Remington Rand were performed by opera- 
tors in the sample. Participating firms were in- 
structed to refer all tabulating machine operators 
for testing. If this procedure was not feasible, 
'Ruth E. Potter, State Test Technician of the 
Minnesota State Employment Service, had major re- 
sponsibility for the supervision of the total study 
and the preparation of this article. Participating in 
the development of the experimental design for the 
study were Ruth E. Potter and John R. Boulger of 
the Minnesota State Employment Service; Dr. Bea- 
trice J. Dvorak and Albert Mapou of the United 
States Employment Service; and Mr. Wayne Spiel- 
man of the National Machine Accountants Associa- 
tion. At the Minnesota State Employment Service, 
James Ryan and Robert Coll conducted the statisti- 
cal analysis of the data. At the national office of 
the United States Employment Service in Washing- 
ton, D. C., the following persons participated in the 
planning of the study, coordination of the collec- 
tion of data by the New Jersey, North Carolina, 
Wisconsin and California State Employment Serv- 
ices, and review of the completed study: Dr. Bea- 
trice J. Dvorak, Albert Mapou, Charles Meigh and 
Sylvia Hoke. For the National Machine Account- 
ants Association, Mr. Wayne Spielman directed the 
promotional activities among NMAA membership. 


operators were to be selected for testing who 
were representative of operators employed by the 
firm with respect to age, sex, work-level, and ex- 
perience. 

All operators had been employed for six 
months or longer so that they had completed the 
probationary period for this occupation. 

The Criterion. The criterion was a rating 
scale? which included items considered by se- 
lected Tabulating Machine Supervisors to be im- 
portant for successful work performance as a 
Tabulating Machine Operator. 

Supervisors were instructed to rate operators 
in comparison with Tabulating Machine Opera- 
tors “in-general.”” This instruction was used to 
obtain, as nearly as possible, comparability of 
ratings among the participating firms. A re-rat- 
ing was conducted within a two-week period for 
the purpose of determining reliability. A_reli- 
ability coefficient of .878 with a standard error 
of .004 was obtained. Since re-ratings were not 
available for the entire sample, the first rating 
was used as the criterion. 

The rating scale was composed of 8 items for 
which the rater had five choices of response indi- 
cating the degree of performance of the operator. 
Weights of 1 through 5 were assigned to these 
responses so that the minimum possible score 
was 8 and the maximum was 40. The mean 
score was 26.05 with a standard deviation of 6.7 
and the range was 8 through 40 for the sample 
of 203 operators. 

All operators having scores one standard devia- 
tion below the mean, or lower, were placed in the 
Low criterion group. Therefore, 37 operators 
comprise the Low group and 165 operators were 
contained in the High criterion group. 

The Predictive Instrument. The machine-scor- 
able form of the General Aptitude Test Battery 
(GATB) was used for this experimental study. 
This battery, composed of 12 tests, measures 9 
aptitudes, viz., general intelligence (G), verbal 
ability (V), numerical ability (N), spatial apti- 
tude (S), form perception (P), clerical aptitude 
(Q), motor coordination (K), finger dexterity 
(F), and manual dexterity (M). The general 


* The Staff also wishes to acknowledge the impetus 
given the study by Kenneth Schenkel whose Ph.D. 


research was the 1952 study. Through Dr. Schenkel 
came the contacts with the NMAA and, apart from 
the standard USES materials and approach, the rat- 
ing scale and accessory materials (modified) devel- 
oped by him were used for this study. 
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working population norms are established on the 
basis of a selected sample of 4000, stratified to 
obtain proportional occupational representation 
as shown by the 1940 Census of the Population. 
The general-population means for aptitudes in the 
battery are 100, with standard deviations of 20. 

Statistical Analysis. The significance of GATB 
aptitudes for the occupation of Tabulating Ma- 
chine Operators was determined on the basis of 
mean aptitude scores, standard deviations, va- 
lidity coefficients, and job analysis data, as shown 
in Table 1. 


Results 


Aptitudes significantly related to success in 
the occupation as evidenced by high mean 
scores, low standard deviations, significant 
validity coefficients and identification through 
job analysis are: (G) general intelligence; 
(N) numerical ability; and (Q) clerical apti- 
tude. Spatial aptitude (S) is also related to 
the occupation as indicated by the validity 
coefficient, identification through job analysis, 
and because it adds to the selective efficiency 
of Aptitudes G, N, and Q. 

Minimum scores for Aptitudes G, N, Q, 
and S were set approximately one sample 
standard deviation below the sample mean 
rounded to the nearest five-point score level. 
This results in norms consisting of G-95, 
N-95, S-85, and Q-100. 

To evaluate the selective efficiency of these 
norms in terms of the relationship between 


Table 1 


Means (M), Standard Deviations (S.D.), Pearson 
Product-Moment Correlations with the 
Criterion (r) for the Aptitudes 
of the GATB 
Note: N = 203 Tabulating Machine Operators. 


Aptitude M 


G 111.4 
Vv 109.1 
N 111.6 
s 106.5 
P 109.9 
Q 116.4 
K 112.0 
F 105.6 
M 106.7 


* Significant at the 5% level. 
** Significant at the 1%, level. 
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Table 2 


Relationship Between Pass-Fail on Test Norms Con- 
sisting of Aptitudes G, N, S, and Q with Critical 
Scores of 95, 95, 85, and 100, respectively, 

and the Criterion 





Group 


Fail 


40 
20 


Total 60 


Pass 


CTret = 14 


p/2 = .001 


Tet = 48; 
x’? = 11.643, 


those operators passing and failing the norms 
and those in the High and Low criterion 
groups, tetrachoric correlation and Chi-square 
techniques were employed. The relationship 
between test norms and the criterion is shown 
in Table 2. 

Both the Chi-square test and the tetra- 
choric correlation indicate a statistically sig- 
nificant relationship between passing the test 
norms and success on the job, as measured 
by the criterion. Fifty-four per cent of the 
Low criterion group fail the norms, while 76 
per cent of the High group pass the norms. 

Cross-Validation. Previously derived norms 
based on the original Minnesota sample, and 
samples of independent studies conducted in 
Ohio and Florida were applied to the na- 
tional sample. Although these norms were 
related to job success, they were not as pre- 
dictive of job success for the national sample 
as they were for the samples from which they 
were derived. In general, the same aptitudes 
appeared to have predictive value for each of 
the studies, but some variation was found 
with respect to the critical scores obtained. 


Summary 


This study reports the development of na- 
tional norms, based on the GATB, for the oc- 
cupation of Tabulating Machine Operator. 

General intelligence, numerical aptitude, 
spatial aptitude, and clerical ability were 
found to be significantly related to success in 
the occupation. 


Received September 10, 1953. 
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In connection with another project the 
Hay Number Perception Test and the Won- 
derlic Personnel Test were administered to 
19 candidates for a special task in a life 
insurance company. It was observed that 
there was an extraordinary number of high 
Personnel Test scores. Out of 19 people 
tested 7 had scores of 40 or more, which is 
above the 98th percentile in similar groups. 
The mean for this group was 35.6; about the 
95th percentile. This insurance company 
had been using the LOMA No. 2A test, which 
is designed for the selection of clerical work- 
ers. A correlation of .66 was found between 
Personnel Test and LOMA No. 2A, but an r 
of only — .01 between Personnel Test and 
Number Perception Test. Thus LOMA No. 
2 test seemed to be excluding any except 
fairly bright applicants. 

However, it is known that the LOMA No. 
2A test is a good predictor of success in sim- 
ple routine clerical work, as well as of pro- 
motability. The Number Perception Test 
has also established its efficiency in routine 
clerical selection and consistently correlates 
very low with mental ability tests. The ques- 
tion immediately presented itself as to which 
of the two tests, LOMA No. 2A or Hay 
Number Perception, would show the higher 
validity in this company in predicting speed 
of production for low-level clerks. This first 
group afforded no criterion of success, since 
it was a mixed group with the majority of the 
employees in supervisory or technical posi- 
tions. So, another group, engaged in simple 
routine clerical work, was selected in order to 
compare the validity of these two tests. The 
SRA Clerical test was also available so it was 
administered, too. The subjects were the 24 
clerks in one department, all but two in the 
lowest pay classifications and performing 
simple routine tasks. Of these, 23 were 
women and none had had any supervisory 
responsibility. Average length of service was 
37.1 months, with six over 5 years. nine be- 


tween 1 and 5 years, eight under 1 year and 
one at five months. Correlation between 
length of service and the supervisor’s ratings 
described below yielded a coefficient of — .08. 

The Tests. LOMA No. 2A is a test avail- 
able only to life insurance companies. It is 
an omnibus work-limit test in six parts: 
checking, directions, same-opposites, prov- 
erbs, arithmetic and spelling. Score is a 
combination of time and errors. Adminis- 
tration time averages about 35 minutes. 

Wonderlic Personnel test is a well-known 
mental ability test composed of a variety of 
verbal and numerical problems. 

SRA Clerical is in three parts, speeded and 
timed separately. Vocabulary is a 5-minute 
test of 48 items. Arithmetic allows 15 min- 
utes for 24 problems of numerical reasoning. 
Checking is a 5-minute coding test of 144 
items. 

Hay Clerical Battery is composed of three 
speeded tests of 4 minutes each. Number 
Perception has 200 pairs of three- to six-digit 
numbers, the task being to check those that 
are the same. Name Finding requires the 
subject to look at a name and remember it 
well enough to pick it out of a group of four 
similar names on the back of the sheet. 
Number Series consists of 30 simple number 
series completion problems. 

The Criterion. The criterion was the aver- 
age of the ratings made by the department 
head and assistant department head. They 
were made about three weeks apart and 
wholly independently. The rating method 
employed three rating principles in combina- 
tion; graphic scale, man-to-man comparison 
and forced distribution. All 24 names were 
listed on a single rating sheet described as 
“Speed of Working” and the rater was asked 
to place a check mark on the line opposite 
each employee’s name in such a way that ap- 
proximately one-half of the names were 
checked in a vertical band designated as 
“Average,” about one-fourth “Above De- 
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partment Average” and about one-fourth 
“Below Department Average.” Distinctions 
among employees in each of these three 
groups were to be indicated by the relative 
positions of the check marks along the lines. 
After the ratings were completed the value 
of each mark was measured on a scale rang- 
ing from 0 to 40, this particular scale being 
arbitrary. 

The product-moment correlation between 
the two sets of ratings yields an r of .89, in- 
dicating a highly reliable criterion. 


Results 


The second column in Table 1 shows the 
correlations between scores on the various 
tests and the average of the scaled values of 
the two ratings. These coefficients point to 
the greater efficiency of four of the tests, but 
such coefficients cannot always be relied upon, 
especially in so small a sample, because re- 
gression is not always rectilinear. 

The third column of Table 1 shows the 
best cutting score on each single test or com- 
bination of tests. The 24 cases fell into five 
groups as rated by the two supervisors. 


Edward N. Hay 


Group I was rated “Good” by both raters; 
group II was rated “Good” by one rater and 
“Average” by the other, etc. The first three 
groups were considered “Good.” Groups IV 
and V were considered “Poor” since one or 
the other rater had so classified all mem- 
bers. Cutting scores were selected by in- 
spection which would admit the greatest num- 
ber of subjects rated “Good” to that group 
and exclude the greatest possible number 
rated ‘‘Poor.” 

The only combination of tests which would 
increase predictive efficiency was Number 
Perception and Name Finding, which cor- 
rectly assigned 21 out of 24 subjects to the 
proper group, “Good” or “Poor.” This was 
significantly better than chance at the one 
per cent level. 


Discussion 


This study confirms other similar studies 
with some of the same tests in showing that 
prediction of success in low-level routine 
clerical work is usually more efficiently ac- 
complished by tests based on what appears 
to be speed of perception than by tests in- 


Table 1 


Predicting “Speed of Work” from Test Scores 





Test 





Wonderlic Personnel 
SRA Clerical: Vocabulary 
Arithmetic 
Coding 
Hay: Number Perception 
Name Finding 
Number Series 
LOMA No. 2A? 
Hay: Number Perception | 
+ Name Finding [{ 
SRA Clerical: Vocab. 
Arith. 
Coding 


Correct 
Selection! 


Cutting 
Score 


Signifi- 
cance 
22 16 of 24 No 
29 15 of 23 No 
13 15 of 23 No 
72 18 of 23 05 

115 18 of 24 10 
19 18 of 24 10 
19 18 of 24 10 
89 18 of 24 10 

-* 21 of 24 01 
29 
13 18 of 23 05 
72 





' Chance would give a correct selection of 12 out of 24. Perfect selection would be 24 out of 24. 
2 No correction has been made for possible restriction of range due to the use of this test in original selection. 


Range in sample however was as great as for other tests given, judging by reference to published tables of norms. 
* Multiple R. 
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volving primarily reasoning problems (1, 2, 
3, 4, 5, 6, 7, 8). 

It is worthy of note that the most efficient 
tests were also the briefest: Number Per- 
ception, Name Finding and SRA Checking. 
This points to the wasted effort of giving a 
large battery of tests, tests with long time 
requirements or an omnibus test, where some 
material may be only dead wood and may 
even reduce the efficiency of the whole test. 
Time is not very important in school situa- 
tions but in industry it is critical, both for 
maintaining good public relations and in re- 
ducing the direct costs of testing. 

Warning has already been given against 
placing complete reliance on product-moment 
coefficients of correlation, on the ground that 
if regression is not rectilinear the coefficient 
may thereby be lower than would be ex- 
pected. Table 1 affords an example. Num- 
ber Series correctly selects 18 out of 24 cases, 
the same figure achieved by three other 
tests and nearly as high as a fourth; yet the 
r is only .04, whereas the others are between 
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.54 and .64. An examination of the scatter- 
diagrams provides the explanation: regression 
follows a U-shaped course for Number Series 
but is almost perfectly rectilinear for the other 
four tests. 


Received May 28, 1954. 
Early publication. 
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A Sales Comprehension Test * 


Martin M. Bruce 
Dunlap and Associates, Inc., Stamford, Conn. 


One of the areas in industry where testing 
has been carried on extensively is sales. 
However, very few instruments dealing with 
selling have been published and are available 
for general distribution. 

The reader will find an excellent review of 
the literature on selection of sales personnel 
in Husband’s article (3). Since that pub- 
lication Rock (4) has published an article 
on his Sales Situations Test. Rock reported 
on just two small sales groups in describing 
the test, one consisting of 25 subjects, the 
other 31 subjects. The instrument attempts 
to present “live” situations, with items in 
multiple choice format. The idea has long 
appeared to the writer to be a sound one. 

Because of the apparent dearth of testing 
material in a field where a great many men 
are tested, the writer set about in 1946 to 
devise a test that would aid in measuring 
potentiality for success in selling. 


Problem 


The problem was one of constructing a test 
that would aid in predicting success in sell- 
ing. Specifically, the test was to be one that 
would be directly applicable to the whole- 
sale sales field in general. 


Procedure 


In 1946 an experimental form of the test was 
prepared in mimeograph format. This instru- 
ment contained 74 items. The items were con- 
structed with the aid of salesmen in various 
fields, business men in occupations related to 
selling, industrial psychologists, and literature on 
selling. 

The 74 items were administered to salesmen in 
various fields throughout the country as well as 
to individuals in occupations other than sales. 
The 50 items that differentiated best between 
the sales and non-sales groups were retained and 
published as the Aptitudes Associates Test of 
Sales Aptitude (Principles of Selling) Form A 
(2). 


* The Sales Comprehension Test, Form M by Mar- 
tin M. Bruce is obtainable from the author at 71 
Hanson Lane, New Rochelle, New York. 


Additional data on 1,404 cases were collected 
on the 50-item form. These cases consisted of 
1,007 non-salesmen and 397 salesmen. The non- 
salesmen consisted of individuals applying for all 
types of jobs other than sales with companies in 
the East and Midwest, students studying psychol- 
ogy in New York and New Jersey colleges, vo- 
cational guidance clients in New York City and 
men in various non-sales jobs throughout the 
country. The sales group consisted of 55 sales- 
men of major and small electrical appliances in 
cities in Ohio and Connecticut; 86 salesmen and 
sales managers of electronics products located in 
practically all common distribution centers in the 
United States; 19 metropolitan New York sales- 
men of office dictating equipment; 13 salesmen 
of hardware products located in Southern and 
Midwest locations; and 224 other individual 
salesmen in a wide variety of fields located in all 
sections of the country. This last group in- 
cluded salesmen in the following fields: office 
supplies, whiskey, beer, soap, razor blades, foun- 
tain pens, automatic pencils, clothing, textiles, 
furniture, dairy products, advertising space, 
pharmaceuticals, books, materials handling prod- 
ucts, machinery, and a number of others. Phi 
coefficients were computed for the 50 items on 
the basis of the above samples. The 30 “best” 
items were retained and published as the Sales 
Comprehension Test, Form M. 

A cross-validation study was conducted by ad- 
ministering the 30-item form to 661 additional 
non-salesmen and 334 salesmen. The non-sales- 
men were in 22 different states and filled 21 dif- 
ferent jobs. The salesmen were employed in 18 
different states and were employed in 11 differ- 
ent sales fields. 

An additional validity study was conducted 
with a group of 82 sales managers employed 
throughout the United States by a door-to-door 
cosmetics sales firm. These sales managers su- 
pervise a group of full and part time saleswomen 
who sell on a commission basis. 


Results 


Validity. Computations for the sales and 
non-sales groups containing the original 397 
and 1,007 cases, respectively, yielded a t of 
13.1. This finding suggests that there is less 
than one chance in 100 that the means of 
these samples are not significantly different. 
However, this measure of difference is spuri- 
ously high since it is based on the same 
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population from which the Phi coefficients 
were computed. The means were 30.8 for the 
sales population and 11.1 for the non-sales 
population. The SD’s were, respectively, 13.8 
and 18.7. 

The cross-validation populations of 661 
non-salesmen and 334 salesmen yielded a t of 
5.8. This statistic brings us beyond the 1% 
level of confidence in assuming that the sales 
and non-sales populations are not similar in 
their responses on this test. This is an in- 
dication of the test’s status validity (5). 
Overlapping amounts to 19% in these popu- 
lations, this percentage of the non-sales group 
equalling or exceeding the median of the sales 
group. In this cross-validation population 
the means of the sales and non-sales groups 
were, respectively, 28.9 and 12.2; the sigmas 
were 12.2 and 16.9; the medians 29 and 12. 

In a study conducted with the sales force 
of a nation-wide electronics sales firm six 
tests were completed by the 86 salesmen and 
sales managers. ‘These included personality 
inventories, mental ability and other ability 
tests and an interest inventory. The Sales 
Comprehension Test correlated higher with 
the rating criterion than any of the other 
six tests employed in the battery. The r was 
32. The criterion has an uncorrected odd- 
even reliability of .92. 

In this group the mean scores for the 77 
salesmen and 9 sales managers were com- 
pared by computing t. The t of 2.4 is sig- 
nificant at the 2% level. There is a 31% 
overlap here, using the same overlap measure 
as above. Assuming that sales managers as 
a group have better sales comprehension than 
salesmen, the indication that the Sales Com- 
prehension Test measures this difference 
further suggests validity for the test. 

Scores on this test were correlated with 
final grades of 27 students studying sales- 
manship at Rutgers University. The r 
proved to be .68, suggesting that this test 
measures comprehension similar to that gained 
by students studying salesmanship in school. 

Correlation with Intelligence. It is a com- 
mon research finding that abilities tend to be 
positively correlated. A particularly frequent 
finding is that tests employed in the same 
situation tend to correlate positively with 
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each other and especially with tests of in- 
telligence (1). Statistical analysis usually 
reveals that various paper and pencil tests 
actually measure to a significant extent what 
intelligence tests measure. Therefore, it is 
important to know the extent to which this 
test is related to measures of intelligence. 

A correlation was run between the total 
score on the Sales Comprehension Test and 
the total score on the Otis Self-Administering 
Test of Mental Ability, Higher Examination: 
Form A. The correlation based on a sample 
of 387 men, women and salesmen was — .19. 
This group was composed of college psychol- 
ogy students studying testing, job applicants 
and vocational guidance clients. In_ this 
group the standard deviation of Otis raw 
scores was 9.7 and the standard deviation of 
Sales Comprehension Test raw scores was 
18.7. The means were, respectively, 56.7 and 
19.8. 

Further research was conducted with the 
aid of Thurstone’s Primary Mental Abilities 
Test which contains five factors. The 173 
subjects include 159 men and 14 women. 
All but four of the men and two of the 
women were evaluated for clerical, sales, 
managerial or engineering positions with vari- 
ous firms in the East. 

The findings appear in Table 1. 

The Sales Comprehension Test score mean 
and standard deviation were, respectively, 
17.2 and 16.9. 

The fact that all of these correlations are 
close to zero and since the correlation with 
the Otis is low and negative, it appears justi- 
fied to state that measures of various intelli- 


Table 1 


Relationships Between Sales Comprehension 
Test and PMA Test 


r with 
Sales 
PMA Factor N Mean Test 


Sigma 


Verbal Meaning 173 
Space 173 
Reasoning 173 
Number 170 
Word Fluency 170 
Totai Score 170 


40.4 7.8 06 
24.6 9.5 .20 
17.2 4.9 05 
46.7 11.5 — 08 
37.0 14.2 02 
224.9 
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gence factors and the Sales Comprehension 
Test are not related. The Sales Comprehen- 
sion Test appears to measure something other 
than intelligence. 

Correlation with Persuasive Preference. 
There appears to be a positive linear rela- 
tionship between persuasive preference as 
measured by the Kuder Preference Record, 
Form CH and performance on the Sales Com- 
prehension Test. Data on these two tests 
were obtained for 146 non-salesmen and 54 
salesmen. The r proved to be .39, significant 
at the 1% level. The standard deviation of 
persuasive preference scores was 15.8 while 
the standard deviation of Sales Comprehen- 
sion Test scores was 17.1. The respective 
means were 37.7 and 17.8. 

The modest but positive and significant r 
between sales score and persuasive score is 
in keeping with the concept that people tend 
to learn in areas in which they are interested. 

Reliability. ° Reliability data in the form 
of tests and retests were obtained from 103 
college students. Scores ranged from — 36 
to 47. The mean of the first testing was 10.4 
and for the second it was 11.1. The standard 


deviations were, respectively, 15.8 and 14.9. 


The test-retest reliability coefficient for this 
group was .71. Because this is a restricted 
group with respect to range of scores, it 
seems likely that the true test-retest relia- 
bility coefficient for the entire population, in- 
cluding salesmen, is somewhat higher. The r 
is .79 when corrected for homogeneity. 


Summary 


An experimental form of a test to aid in 
selecting and evaluating salesmen was pre- 
pared in 1946. Preliminary validity data led 
to the elimination of 24 of the 74 multiple 
choice items. Over a period of five years 
data were collected on the 50-item form. 


Martin M. Bruce 


Data on 1,398 cases indicated that there 
were 30 items that significantly and reliably 
differentiated salesmen from non-salesmen. 
These items have been combined to form the 
Sales Comprehension Test, Form M. This 
test was cross validated on a supplementary 
population. 

This instrument proved to be the most 
valuable in predicting success among sales- 
men and sales managers in a national sales 
organization. The test correlated signifi- 
cantly with final grades in a class in sales 
principles. The instrument, unlike other 
paper and pencil tests, does not measure in- 
telligence to any extent. People who show 
high preference for persuasive activities tend 
to do better on this test. The test appears 
capable of differentiating good from poor 
sales personnel. 

A test-retest reliability coefficient, .79 cor- 
rected for homogeneity, is sufficiently high 
for group situations to warrant confidence in 
its consistency of measurement. 

The Sales Comprehension Test, Form M, 
appears to be an instrument that can be 
utilized in sales selection and evaluation 
situations. Its validated item content also 
lends itself to sales training situations. 


Received November 6, 1953. 
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A New Method for Obtaining Weighted Composites of Ratings 


H. F. Dingman and J. P. Guilford 


University of Southern California 


In spite of the many weaknesses of ratings 
of personnel] obtained in the practical situa- 
tion, they still often remain the only criterion 
against which to validate predictive meas- 
ures. It is therefore important that we cor- 
rect for weaknesses wherever we can in order 
to achieve the best information obtainable con- 
cerning the validity of selection instruments. 

It has been amply demonstrated that in 
order to obtain increased reliability, and 
hence also probably increased validity, of 
criterion ratings, it pays to combine ratings 
from several raters. One of the common dif- 
ficulties in this connection, however, is that 
no rater is acquainted with all the ratees in 
the experimental group. At best, not all 
raters are equally well informed concerning 
all ratees. It is also true, even when raters 
know ratees fairly well, that each rater uses 
different information and rates on different 
qualities. Under such conditions not all rat- 
ings should be given equal weight in forming 
composites. This report is concerned with 
the development of a method of weighting ob- 
tained ratings in terms of two rater character- 
istics. One is the rater’s tendency to rate on 
qualities in common with other raters and the 
other is the rater’s degree of confidence in his 
rating of particular individuals. 

The problem of weighting ratings arose in 
connection with a project on the validation 
of a new testing instrument designed for the 
selection of personnel who come under the 
general category of Psychiatric Technicians 
(Ward Aides) serving in a state institution." 
A total of 716 such personnel in the same 
institution were under study. Each one had 
been rated by four different supervisors who 
had been in positions favorable for observing 
their performances. A graphic rating was 
given on a line seven centimeters long under 
the instruction to rate for general effective- 
ness on the job. Each rater also gave a rat- 
ing on a similar line indicating his own degree 

1 This study was done as part of a project on the 
selection of Psychiatric Technicians, supported by a 


grant from the U.S. Public Health Service in contract 
with the Pacific State Hospital, Spadra, California. 
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of assurance that his rating of effectiveness 
was correct.2 Some of these ratings would 
be zero or near zero where the raters felt 
that they had little or no basis for making the 
rating of effectiveness. 


Intercorrelations 


Before adopting any system of weighting 
the ratings to form a composite criterion 
measure, we decided to obtain as much in- 
formation as possible concerning the proper- 
ties of the ratings. This was accomplished 
through intercorrelations of raters and fac- 
tor analyses of both the effectiveness and the 
assurance ratings. 


Table 1 


Intercorrelations of Effectiveness Ratings 
(N = 716) 


Rater 
D 

53 
AS 
08 


c 
16 
Ai 


B 
J 54 
B 
Cc 
D 


16 
53 


aol 


AS .08 


Table 1 shows the intercorrelations among 
the four raters, using the effectiveness ratings 
of all 716 employees. It is obvious that raters 
A, B, and D show about the same level of 
inter-rater agreement on ratings of effective- 
ness, while rater C shows little agreement 
with any of those three. 

The factor analysis of the correlation ma- 
trix-was carried out by the centroid method, 
with iterative solutions until communalities 
were stabilized. The results appear in Table 
2. Here it is seen, first, that one common 
factor is sufficient to account for the inter- 
correlations. In can also be seen that rater 
A has definitely the highest communality. 
This is significant in view of the fact that A 
was a supervisor who makes the major de- 
cisions concerning work assignments and in- 

2 The suggestion for obtaining ratings of assurance 


was made by Dr. Anna Shotwell of the Pacific State 
Hospital staff. 
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Table 2 


Loadings in the Single Common Factor in the Four 
Raters’ Ratings of Effectiveness 





Factor 
Loading 
83 68 
68 A6 
16 03 
64 Al 





ter-ward transfers. Rater C had very little 
in common with the other raters. Whether 
this means that C did not know essentially 
the same employees as the others or rated 
them on different qualities we cannot tell 
from this information alone. Taken at its 
face value, we might well conclude that C’s 
ratings should receive less weight in a com- 
posite, if they were used at all. 

The intercorrelations of assurance ratings 
are shown in Table 3. Since assurance may 
be assumed to be highly correlated with the 
degree of acquaintance between rater and 
ratee, we may conclude from Table 3 that 
raters B and C had the least in common with 
respect to ratees whom they knew or did not 
know. Rater A, who had the greatest com- 
munality in her ratings of effectiveness, knew 
more of the ratees in common with D than 
with B and C. The factor analysis gave a 
structure with two common factors. 

Taking this information together with that 
from the analysis of the effectiveness ratings, 
we conclude that C’s lack of communality 
with the other raters was not due to the fact 
that he knew different employees. C dis- 
agreed with the other raters generally as to 
relative effectiveness of employees that were 


Table 3 


Intercorrelations of Ratings of Assurance Connected 
with the Effectiveness Ratings 


Communality 
h? 


Table 4 


Rotated Factor Loadings of the Raters with Respect 
to Their Ratings of Assurance 


Factors 


Rater I If 


A 31 60 46 
B 06 33 AM 
€ a2 00 52 
D 64 49 .66 


Communality 


rated in common. ‘This disagreement could 
mean that C emphasized different qualities 
or it could mean that he rated the same 
qualities but made different evaluations of 
employees with respect to those qualities. 
One might conclude that C’s ratings are so 
inconsistent with those of the consensus that 
they should not be included in a composite. 
On the other hand, perhaps C had some neg- 
lected valid qualities or some better evalua- 
tions to contribute. The best solution seems 
to be to include C’s ratings for what they are 
apparently worth, that is, to give them a 
relatively low weight. 


The Weighting System 


The weighting system we propose and that 
we have used in connection with each of the 
Psychiatric Technicians takes into account 
two variables. One is the factor loading of 
each rater in the single common factor in 
the effectiveness ratings. Each rater’s rat- 
ings, regardless of ratee, is multiplied by this 
weight. The other weight is the rater’s rat- 
ing of assurance that he applied to each ratee. 
The over-all weight to be applied to each 
effectiveness rating is therefore a product of 
these two values. The composite rating for an 
employee is a weighted mean of the four effec- 
tiveness ratings given him by the four raters. 

In order to state more explicitly how the 
weighted mean is computed we define the 
following symbols: 


Let 


= rating of effectiveness of individual I 
given by rater K, 
rating of assurance that rater K makes 
concerning his rating X,. of individ- 
ual I, 
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F, = general-factor loading of rater K in 
his effectiveness ratings,* 
and 


X, = weighted mean of effectiveness rating 
for individual I. 


The equation reads 


DA eF EX it 
XxX, = TAaFs (1) 


The summations in both numerator and 
denominator are over all raters. 


Reliability of the Composites 

In order to determine whether the weight- 
ing system leads to improvement over a 
simple summation or average of ratings, we 
have made a reliability study of composites 
derived with and without weights. Relia- 
bility is defined here as inter-rater consistency 
or inter-composite consistency. It is im- 
possible to estimate the reliability of com- 
posites of all four ratings, but it is possible 
to estimate reliabilities for composites of two 
raters at a time. Consequently, the raters 
were combined in all.possible pairs of two 
and for each pair a weighted and an un- 
weighted composite were computed for each 
ratee. The three possible intercorrelations of 
such composites, weighted and unweighted, 
are given in Table 5, based upon 50 ran- 
domly selected ratees. In every case, the 
weighted composites show a higher inter- 
correlation; in two cases very much higher. 
We have not applied the Spearman-Brown 
formula to estimate the reliability of com- 
posites of four raters for the reason that the 
conditions for applying that formula are 
probably not satisfied. The chances are that 
the reliability of such a weighted composite 
would be higher than any of the estimates in 
Table 5. This would indicate that in com- 
bining the four ratings for each employee, 
with the weights, we have a criterion that is 
sufficiently dependable for use in a valida- 


3 If there should be more than one common factor 
in such an analysis, the investigator has at least two 
alternatives. One would be to use the first centroid 
factor loadings. This would be preferred when other 
factors are particularly weak. The other alternative 
would be to use the loadings from each factor (after 
rotation) separately as a set of weights and to com- 
pute a criterion measure corresponding to each fac- 
tor. Unless these weights were very different, how- 
ever, the two criteria would be highly correlated. 


Table 5 


Correlations Between Unweighted and Weighted Com 
posites of Ratings of Effectiveness Assigned 
by all Possible Pairs of Raters * 





Weighted 
Raters Composites 
AB vs. CD — .04 54 
AC vs. BD 58 64 
AD vs. BC 18 54 


Pairs of Unweighted 


Composites 





* From a random sample of 50 ratees. 


tion study. In view of the two very low 
correlations for the unweighted composites, 
there is some question as to whether an un- 
weighted composite of four ratings would be 
sufficiently dependable to serve as a criterion. 


Summary 


This article faces two problems: (1) the 
fact that different raters in a practical situa- 
tion do not know employees equally well and 
thus cannot rate them with equal assurance; 
and (2) the fact that raters differ with re- 
spect to how well they reflect the consensus 
of the group of raters. The ratings of ef- 
fectiveness of 716 hospital employees given 
by four supervisors were studied by factor 
analysis to determine what their consensus 
indicated. One common factor, in which raters 
had quite different factor loadings, was suf- 
ficient to account for the intercorrelations of 
effectiveness ratings. In rating each em- 
ployee, each rater also gave a rating of de- 
gree of his assurance of his correctness. 

A factor analysis of these assurance rat- 
ings gave two common factors, which were 
taken to indicate communalities of acquaint- 
ance with the employees. The results of the 
two facter analyses led to the inclusion of the 
ratings of all raters and to the use of weights 
in forming composite ratings. One weight 
was the factor loading of the rater obtained 
from intercorrelations of the effectiveness 
ratings. The other weight was the rating of 
degree of assurance. The composite was a 
weighted mean of the four ratings of each 
employee. It was demonstrated that weighted 
composite ratings based on this principle were 
definitely more reliable than corresponding 
unweighted composites. 


Received October 11, 1953. 
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In developing a quantitative scoring pro- 
cedure and in selecting items for a test which 
elicits item responses that cannot be readily 
classified as correct or incorrect, test con- 
structors have customarily used one or a 
combination of three procedures. The first 
procedure is that of having authorities or 
“juries” select the item response which they 
believe parallels a definition of the behavior 
being evaluated. The resulting individual 
item validity and total test validity are thus 
dependent upon the judges’ interpretation of 
the defined behavior. The second procedure 
is that of assigning larger values to those re- 
sponses internally consistent with the total 
score. The validity of items keyed and se- 
lected according to this procedure depends 
upon the validity of the total score. The 
third procedure involves constructing a key 
and selecting items after correlating each 
possible item response with an external cri- 
terion, usually some behavior display or rat- 
ing of the subjects. Insofar as subsequent 
prediction of behavior external to the test 
score is concerned, the external criterion tech- 
nique contains inherent advantages. If the 
criterion against which the items have been 
validated is a heterogeneous criterion, how- 
ever, the test will also tend to be hetero- 
geneous. Item selection techniques proposed 
by Horst (6), Gulliksen (5), Davis (1), and 
French (4), which combine elements of the 
second and third procedures, tend to reduce 
the heterogeneity of the test. 

When a battery of tests is used for the 
prediction of a criterion, maximum predictive 
effectiveness will occur when each test in the 
battery has a high cerrelation with the cri- 
terion and a low intercorrelation with the 
other tests in the battery. Thus if a new 


test is to be combined with previously avail- 
able tests for the prediction of some criterion, 
then the items in the new test should meas- 
ure some part of the criterion not already 
being measured. When individual items of 
a new instrument are validated against the 
criterion, the test constructor is usually as- 
sured of some subsequent predictive effec- 
tiveness when the test is used singly for pre- 
diction. If a test validated in such a manner 
is added to a battery, however, the test con- 
structor has no assurance that the test will 
increase the total predictive effectiveness of 
the battery. The reason for this lack of as- 
surance may be that the extent to which the 
items in the new test intercorrelate with the 
other tests in the battery has not been con- 
sidered. The desirability of a technique 
which takes into consideration that variation 
in the criterion already associated with other 
prediction variables is readily apparent. 

It was the purpose of this study to de- 
termine the relative effectiveness of: (1) key- 
ing the items of a new inventory to be added 
to a test battery in terms of their correlation 
with the total variation of an external cri- 
terion; and (2) keying the same items in 
terms of their correlation with the criterion 
variation unexplained by other tests in the 
battery. 


The Techniques 


The External Criterion Technique. Be- 
cause the external criterion technique for 
keying item responses to attitude, interest, 
personality, and biographical data _instru- 
ments has been widely used for the past 
twenty years, specific instances of its applica- 
tion will not be cited here. Essentially this 
technique consists of obtaining the correla- 
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tion between each item response of a key 
group and a criterion. If a test contains 20 
items each having four possible responses, 
then 80 correlations are obtained. The key 
is constructed by assigning quantitative values 
to subsequent responses according to the size 
and/or direction of the correlations. With 
such a procedure more than one response can 
be scored for each item. The total score for 
subsequently administered tests is then ob- 
tained by summing the values assigned to 
the item responses of each subject according 
to the key. The desirability of checking the 
validity of the total score for members of an- 
other sample, independent of the key group, 
should be obvious. 

The Deviate Technique. The procedure 
of keying item responses according to their 
correlation with the unexplained criterion 
variation, here referred to as the deviate tech- 
nique, is much less well known than the ex- 
ternal criterion technique. Instances in which 
the deviate technique has been used include 
the research of Neidt and Merrill (9) and 
Neidt and Edmison (10). In an article pub- 
lished in 1951, Meyers and Schultz (8) de- 
scribed a modified version of this technique, 
and in an article appearing in 1953, Schultz 
and Green (11) reported the use of the devi- 
ate technique in a way similar to that used 
in the present study. 

In constructing a key with the use of the 
deviate technique, the responses of a key 
group to each item are correlated with that 
part of the criterion variation which is not 
associated with other test scores in a bat- 
tery. In the analysis of regression of a test 
battery and a criterion, the criterion variance 
unexplained by other tests can be expressed 
for any group as follows: 


Sy? — [a,Sx,y + ... + San2rny] 


where Sy? is the criterion sum of squares, the 
xy’s are the sums of the cross products of 
the test scores and the criterion in deviation 
form and the a’s are regression weights de- 
termined by least squares. The foregoing ex- 
pression can be readily changed to raw score 
form. For any individual in the group for 
which the regression weights have been de- 
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termined, an indication of the unexplained 
variation may be obtained from Y — Y, in 
which Y is the actually obtained criterion 
measure, and Ff’ is the criterion measure pre- 
dicted for this individual from scores in the 
test battery. After prediction and subtrac- 
tion from the actual criterion measures have 
been made for each individual in a key group, 
a distribution can be formed which represents 
that variation in the criterion that is unac- 
counted for by the tests in the battery. This 
distribution will be distributed around zero 
and its shape, although influenced by the 
shape of the criterion distribution, will tend 
toward normality. It is this distribution of 
actual-minus-predicted criterion measures with 
which item responses are correlated in the 
use of the deviate technique. 


Procedure 


Collection of Data. A 201-item life experi- 
ence and attitude toward education inventory, 
constructed by Malloy (7), was administered to 
309 freshman women entering the University of 
Nebraska in September, 1952. Of the 201 items 
in the inventory, 112 were of the multiple-choice 
type and 89 were of the paired-statement type. 
The items were designed to reflect experiences 
and attitudes in four areas, viz., school experi- 
ences and attitudes toward education, self ap- 
praisal, family relationships, and choice of friends. 

The 309 students were subdivided into two in- 
dependently drawn random subsamples of 155 
and 154 students each. The sample containing 
155 students was designated as the key group 
and the sample of 154 was designated as the 
cross validation group. 

Since the inventory was constructed to be used 
with a battery of two other preregistration tests 
scores on these tests were obtained for both 
groups. The two preregistration tests involved 
were the American Council on Education Psy- 
chological Examination, Linguistic subtest, and a 
local English achievement test, entitled the Eng- 
lish Placement test. Raw scores are customarily 
converted to a one-to-nine scale at the Univer- 
sity of Nebraska and these converted scores were 
used in this investigation. 

The criterion used in this study was first-se- 
mester average course mark. Course marks are 
also reported on a one-to-nine scale at the Uni- 
versity of Nebraska, nine being the highest mark 
and one signifying failure. Weighted averages 
for the students were obtained according to the 
hours of credit involved for individual courses. 

Thus the data for this study, other than scores 
on the inventory, included: first-semester average 
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Table 1 


Weights Assigned to the Item Correlations 
for Each Key 








Correlation Weight 
0.25 or higher +2 
0.10 to 0.24 +1 

—0.09 to 0.09 0 

—0.10 to 0.24 —1 

—0.25 or higher —2 





course marks, ACE-L scores, and English Place- 
ment scores for two groups of 155 and 154 stu- 
dents each. 

Development of the Keys. To develop the 
two keys for the inventory, two separate analy- 
ses were made of each item response. In con- 
structing the key according to the external cri- 
terion technique, the correlation between each 
item response and the criterion was estimated 
with the use of Flanagan’s correlation table (3). 
In constructing the key according to the deviate 
technique the correlation between each item re- 
sponse and the distribution of actual-minus-pre- 
dicted course marks was obtained in the same 
manner. The regression equation used in ob- 
taining the actual-minus-predicted distribution 
was 


VY = .197 X,+.195 X, + 4.125 


where X, is English Placement score in stanine 
form and X, is ACE-L score in stanine form. 

After the two complete sets of correlations had 
been obtained, the two keys were constructed by 
assigning weights to each item response accord- 
ing to the size and direction of the correlation 
with each criterion. The weighting system used 
for each key is shown in Table 1. 

The limits of the intervals in the distribution 
of correlations as well as the weights were arbi- 
trarily designated. Because the use of Flana- 
gan’s table for estimating correlations involves 


only the upper and lower 27 per cent of the cri- 
terion distribution, the significance from zero of 
the estimated correlation coefficients was not 
ascertained. 

The degree of similarity between the weights 
assigned to each item response for the two keys 
may be seen from Table 2. Each of the 201 
items of the inventory contained from two to 
five response choices which yielded the total of 
629. The coefficient of correlation between the 
two distributions of response weights shown in 
Table 2 is .509. The deviate technique key con- 
tained 368 item response weights other than zero 
as compared with 339 such response weights for 
the external criterion technique key. It should 
be recalled that in responding to the inventory. 
however, each subject gave 201 responses, rather 
than 629. 

The number of items having response weights 
of zero for all choices within the item was found 
to be 53 for the external criterion technique key 
and 42 for the deviate technique key. Of the 
items having all response weights of zero, 34 
such items appeared in both keys. In summary, 
eleven more items in the deviate technique key 
than in the external criterion technique key con- 
tained one or more response weights other than 
zero. 

The inventories of the 154 students in the 
cross validation group were scored using each of 
the two keys. To avoid negative scores, the 
constant 50 was added to each of the two in- 
ventory scores for the subjects in the cross vali- 
dation group. The correlations between each of 
the two inventory scores and the criterion and 
between these scores and the other test scores in 
the battery were computed. The significance of 
the contribution of each independent variable to 
the prediction scheme was ascertained by analy- 
sis of regression. 


Results 


In Table 3 are shown the zero order co- 
efficients of correlation between the variables 
in this study. It is interesting to note that 


Table 2 


Item Responses Classified by Weight According to Two Keys 








External 
Criterion 


Deviate Technique Key Weight 





Key Weight 


+1 +2 





12 16 
49 9 
55 12 
17 0 
2 1 
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Tabie 3 


Zero Order Coefficients of Correlation Between Each 
Pair of Variables for 154 Cross 
Validation Students 


X; X2 X; X 
Y 446 446 512 332 
X; 735 480 478 
X, 334 304 
X; 624 


Y = Average Course Mark. 

X, = External Criterion Technique score. 
X2 = Deviate Technique score. 

X3 = English Placement score. 

X¥, = ACE-L score. 


the two inventory scores yielded correlations 
of the same magnitude with the criterion. It 
should also be noted that, in general, the 
deviate technique score correlated lower with 
the other scores in the battery than the ex- 
ternal criterion technique. 

In Table 4 are shown the multiple and 
partial correlation coefficients of the combined 
variables. Inspection of Table 4 indicates 
that the deviate technique score contributed 
significantly to the effectiveness of the total 
battery, whereas the external criterion tech- 
nique score did not. In addition, the optimal 
combination of two prediction variables in- 


Table 4 


Multiple and Partial Correlation Coefficients 
for Combined Variables 





Multiple 
Correlations 


Partial 
Correlations 





TY X_-X1XyX, = .329** 
VY Xy-X—X—xX, = -055 
TYX\-XsxX, = .084 
TYXeX3X, = .358** 


Ry(xiX2xsXo = - 

Ry(x2xsx) 9 = 587 
Ry(xixsxy 0 = -517 
Ry (x1) .516 
Ry (x23) .570 
Ry (xx 362 
Ry (xx 485 
Rromxe 515 





Y = Average Course Mark. 
X, = External Criterion Technique score. 
X., = Deviate Technique score. 
X; = English Placement score. 
X, = ACE-L score. 
** Indicates a partial correlation coefficient signifi- 
cantly different from zero at the 1 per cent level. 
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cludes the English Placement test score and 
the deviate technique score of the inventory. 
Discussion 

The empirical results from this investiga- 
tion indicate that the deviate technique is 
superior to the external criterion technique 
for keying items of a new test to be added to 
an already existing battery. If the key for 
the life experiences inventory used in this 
investigation had been constructed using only 
the external criterion technique, the inventory 
would not have significantly increased the 
predictive effectiveness of the battery. 

The similarity of the zero order correla- 
tions, .446 and .446, between average course 
marks of the cross validation group and the 
two inventory scores is striking. It is doubt- 
ful that such close correspondence will be 
found in subsequent studies of a similar na- 
ture. In general, it seems reasonable to 
postulate that the more homogeneous the 
criterion, the more divergent such coefficients 
of predictive effectiveness for the two tech- 
niques will become, i.e., with a homogeneous 
criterion the external criterion technique cor- 
relation will probably be higher than that 
found for the deviate technique. The simi- 
larity of the two coefficients found in this 
study is perhaps the result of the heterogeneity 
of the criterion. 

The assignment of weights ranging from 
—2 to +2 to the item responses of the in- 
ventory imposed a condition of item selection 
on the keying procedures. Some items were 
assigned to weight of zero according to one 
keying technique and weights other than zero 
according to the other technique. Such dif- 
ferences between the weights assigned to the 
item responses will influence the apparent 
length of an instrument and the variability 
among the resulting total scores. Thus in 
comparing validity coefficients to evaluate 
two keying techniques, consideration should 
be given to differences between measures of 
central tendency and variability of the total 
score distributions. If the variability of the 
distribution obtained by one keying tech- 
nique is considerably larger than the varia- 
bility of the other distribution, differences 
between validity coefficients could result 
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which are attributable to differences between 
the total score variabilities rather than to 
actual differences in effectiveness of the tech- 
niques. When the means and standard devia- 
tions for the two total score distributions in- 
volved in this study were computed, the 
means were found to be 55.44 and 54.06 and 
the standard deviations were found to be 
17.59 and 16.02 for the external criterion 
technique and the deviate technique, respec- 
tively. Because these differences are so small, 
it is felt that the greater contribution made 
by the deviate technique key scores to the 
predictive effectiveness of the total battery 
was not attributable to the item selection im- 
posed by the weighting procedure. Ap- 
parently the scored items of the deviate tech- 
nique key contained more similar response 
weights within the items than the scored 
items of the external criterion technique key. 
Such a condition could result in a larger mean 
and standard deviation for the external cri- 
terion technique key total scores. 

The fact that the score on the life ex- 
perience inventory contributed significantly 
to the prediction of average course marks 
suggests the importance of evaluating other 
characteristics of students than scholastic 
aptitude and achievement. A detailed descrip- 
tion of the content, construction, and analy- 
sis of the instrument used as a vehicle for 
this study will be published subsequently. 


Summary 


The purpose of this study was to determine 
the relative effectiveness of keying the items 
of an inventory to be added to an already 
existing test battery according to: (1) the 
correlation of the item responses with the 
total variation in a criterion (first semester 
average course marks); and (2) the correla- 
tion of the same item responses with the 
criterion variation unexplained by other tests 
in the battery. Two sets of keys were con- 
structed based upon the responses of 155 
subjects. Each inventory of 154 subjects 
constituting a cross validation group was then 
scored using the two keys. The zero order 
correlations between the score derived from 
each key and the criterion were found to be 
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identical for the 154 subjects in the cross 
validation group. When the two scores were 
combined with others in a test battery the 
contribution to the predictive effectiveness of 
the total battery made by the key derived 
from correlating item responses with the un- 
explained variation was found to be signifi- 
cant. The contribution made by the key 
derived from correlating item responses with 
the total criterion variation was found to be 
not significant. 


Received October 21, 1953. 
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Industrial accidents, their prediction and 
control and the various factors related to 
and affecting them have long been a subject 
of study for the psychologist in industry. 
One of the specific topics of interest has been 
the relationships existing between the age 
and experience of the worker and his accident 
frequency. 

Most research studies in this area have 
demonstrated the existence of some relation- 
ship between accidents and both experience 
and age. Though by no means universal the 
general conclusion arrived at in these experi- 
ments is that accident frequency tends to 
decline with increasing age and/or experi- 
ence. ¢ 

Many of the studies of experience suffer, 
however, from a procedural error. The most 
common method applied in this type of study 
appears to be to divide the men in a given 
organization into experience groups and then 
to calculate the accident rate of each group. 
The application of this method of necessity 
assumes that if no differences in experience 
exist, all of these different groups would have 
the same average number of accidents. How- 
ever, it is also reasonable to assume that in 
many jobs the high-accident employees will 
tend to drop out either through retirement 
due to injury, separation or voluntarily leav- 
ing employment. Such a natural selection 
process tends to retain on the job only those 
persons who have maintained a certain safety 
standard in their operations. 

The usually discovered decrease in acci- 
dent frequency with experience may be due 
then to this natural selection process. What 
would then appear to be necessary in order 
that the effects of experience may be properly 
evaluated is to follow the accident history of 
the same group of workers over a period of 
time. Several studies (1, 2, 3, 5) have done 
this. Unfortunately, in most instances these 
studies either follow the employee's accident 
history for only a relatively short duration or 


fail to remove possible influences due to the 
operation of the age variable. 

The study of the relationship between age 
and accident frequency presents a somewhat 
similar picture to the experience problem. 
The typical procedure here again is to sub- 
divide employees into differing age groups 
and to compute the mean number of acci- 
dents for each age group. In most instances, 
however, age is highly correlated with ex- 
perience, thus confusing the issue and making 
it difficult, if not impossible, clearly to ascribe 
any discovered relationship either to age or 
to experience. 

Attempts have been made to minimize the 
effect of the experience variable through the 
utilization of partial correlational methods 
(1, 3). However, these methods are also 
subject to question in that it is not certain 
that experience may be held constant by 
using partial correlation methodology in view 
of the safety selection process previously 
mentioned. It seems probable that the opera- 
tion of these selective factors prevents com- 
pliance with certain basic assumptions in- 
herent in this statistical method. 

It is the author’s purpose therefore to pre- 
sent material obtained in a different manner 
from most of the previous studies in this field 
in an attempt to provide more information 
and gain further insight into the existence of 
the relationships between age and experi- 
ence with accident frequency. 


Subjects 


The subjects used in this study are em- 
ployees of a copper plant in Indiana. These 
subjects were selected from six sections com- 
prising a single large department operating 
metal forming mills. Work tasks were iden- 
tical for the members of all groups and no 
unusual differences in pressures of produc- 
tion were observed for the different groups 
during the periods of data collection. 

Conditions of work, light, heat, ventilation 
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were also highly similar for all subjects as 
were the number of hours worked. Only em- 
ployees working on the same shift were used 
in this experiment. 

Conditions and methods of work, together 
with type of equipment, remained virtually 
constant throughout the five-year experi- 
mental period. 

A total of 1,237 employees who remained 
with the company in the above mentioned de- 
partment for the experimental period had 
their accident records carefully traced and 
charted for each month of the period. In 
addition other members of the work force 
hired at the same time (when the plant was 
first opened) but who dropped out or were 
separated also had their records carefully 
tabulated and recorded. These workers at 
the onset of the experimental period totaled 
an additional 1,317 workers. 

The number of accidents experienced by 
each man was readily traced through em- 
ployee history data which contained carefully 
detailed records of dispensary visits and their 
reason and cause. It is felt that this cri- 


terion is valid since it is a compulsory policy 
of the company to have all employees who 


are injured on the job, regardless of how 
slight the injury might be, visit the dis- 
pensary for medical clearance, treatment, and 
report. No distinctions as to severity of 
accident were made in this study. Only 
accidents occurring during working hours and 
in actual performance of the job were used. 

Accident frequency data were reported on 
the basis of mean number of accidents per 
1,000 man hours of operation. Payroll rec- 
ords of the subjects provided the necessary 
data for computation. 


Results 


Figure 1 displays graphically the mean 
number of accidents per 1,000 man hours of 
operation for both of the experimental groups 
and also the entire departmental mean acci- 
dent rate. Accident rate figures are reported 
on a monthly basis for a period of 60 months 
or 5 years. Accident rates for the turnover 
group are not reported after the first 30 
months because of the small number of work- 
ers remaining in that group beyond this 
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period of time. (The number of workers in 
the turnover group was reduced to 243 mem- 
bers at the end of thirty months.) 

It can readily be seen from the presented 
data that in this particular instance the acci- 
dent rate for these workers declines rapidly 
during the first five months of operation for 
both of the groups. The entire department 
mean accident rate closely approximates the 
rate curves of the two experimental groups. 
This is readily explained by the fact that the 
two experimental groups, particularly in the 
early phases of the experimental period, com- 
prised a majority of the entire work force. 

The tendency for the departmental rate 
curve to be higher during the latter phases of 
the experimental period can be attributed to 
the incorporation of newly hired employees 
into the work force. 

The consistency with which the accident 
rate curve of the turnover group remains 
higher than that of their fellow-workers tends 
to support the hypothesis that a natural se- 
lection process does exist. The higher rate 
and more gradual decline in accident fre- 
quency for this group of turnover employees 
apparently is indicative of an informal and 
perhaps to some extent a formal weeding out 
of high-accident workers. 

In studying these accident rate graphs the 
effect of job experience upon the accident rate 
of these workers appears to be considerable 
for their first five months of employment, but 
seems to be of little significance beyond the 
fifth month of employment. The general 
leveling off in accident rate after five months 
on the job seems to point up the thesis that 
experience makes its contribution towards ac- 
cident rate reduction by familiarizing the em- 
ployee with proper work and safety habits. 
Apparently five months of on-the-job duties 
is sufficient for these workers on this particu- 
lar type of operation to become well enough 
trained to reduce accident rate to what may 
be considered normal expectaicy. It should 
be pointed out that these workers did not re- 
ceive the benefit of any formalized pre-job 
assignment training and so actual experience 
was called upon to substitute for this formal- 
ized training. 

These initially high accident rates would 
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Fic. 1. The relationship between experience on the job and the average monthly accident rate per 1,000 
hours of operation for a non-turnover group and a turnover group. 


appear to lend further support to the often 
stressed necessity for proper immediate train- 
ing in correct work methodology and safety 
habits. 

In order to provide an experimental test of 
this conclusion the accident rates of another 
872 workers were charted. These men had 
been hired at various times after the com- 
pany was better established and so received 
the benefit of formal training in correct job 
procedure and safety methods. These men 
also performed the same work tasks under 
highly similar conditions. Data on this group 
for their first fifteen months of employment 
are graphically presented in Figure 2. 

Results here follow the same general pat- 
tern found for the previous groups. There is 
an almost identical sharp decline in accident 
frequency for the early on-the-job period fol- 
lowed by the same leveling off pattern. Of 
note, however, is the fact that the initial 
accident frequency rate is markedly lower for 


this group. Furthermore, the level which 
approximates what has been termed normal 
expectancy for the previous groups is reached 
after the third month of on-the-job perform- 
ance rather than after the fifth. 

In view of the strong similarity between 
the work tasks and work environment of this 
and the other two previous groups, this reduc- 
tion in the frequency of accidents amongst 
these workers for this formative period can in 
the author’s opinion be traced only to the 
benefits derived from the formal training 
program. 

However, the observed decline, still sharp, 
for these trained workers during the early 
phases of their employment still suggests the 
importance of actual accumulated on-the-job 
experience in bringing accident rates down to 
what might be considered normal. 

Still untested is the effect of age upon acci- 
dent frequency. To study this relationship 
two other groups were formed. These groups 
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were matched on the experience variable. 
Group A was a young group, (Mean Age 
= 28.7 years, S.D. = 1.4, N = 639) with ap- 
proximately three years of experience (Mean 
experience = 2.9 years, S.D.= .45). Group 
B was composed of older workers (Mean Age 
= 41.1, $.D. = 2.9, N = 552) also with ap- 
proximately three years of experience (Mean 
experience = 3.2 years, S.D. = .63). 

Accident frequency rate for these groups 
(Figure 3) differs markedly throughout the 
eighteen month experimental period.  Al- 
though both groups have the same amount of 
experience, the younger group has what ap- 
pears to be a significantly higher accident rate 
than their older work companions. 

As might be expected the younger group’s 
(Group A) accident rate is above the depart- 
ment level while the mean accident rate of 
the older group (Group B) is below the de- 
partment’s level for these particular periods 
of time. 

To further pursue this study of the effects 
of age upon accident frequency rate a third 
group (Group C) was used. These workers 
were similar to Group B in that they too 
were an older group (Mean Age = 39.2, S.D. 


= 3.1, N = 297), but unlike either of the 
two previous experimental groups these men 
were inexperienced at the onset of the experi- 
mental period. They did, however, receive 
the benefit of training prior to actual job 
assignment and performance. 

As can be seen from Figure 3, the accident 
frequency rate for this group again as in past 
instances shows the same early sharp decline 
followed by a general leveling off to a posi- 
tion approximating that of the older group 
(Group B). The accident rate of Group C 
follows also the pattern of the previous trainee 
group although mean accident frequency is 
somewhat lower throughout the period. 

It is to be noted that from the third month 
onward and practically from the second 
month onward the accident rate for this 
group of workers is lower than that of their 
younger and much more experienced fellow 
workers (Group A). It is also to be noted 
that this older group functions below the 
mean departmental level after what might be 
termed the three-month breaking-in period. 

The greater strength of the relationship be- 
tween age and accident frequency rate as 
compared with experience and accident fre- 
quency rate becomes even more noticeable as 
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the experience level differences begin to dis- 
appear with the passage of time spent on the 
job. 

It would appear then from these data that 
age is definitely related to accident frequency. 
Older employees in this study even when less 
experienced maintain better safety records 
than do the younger men. The accident rate 
of these younger workers exceeds slightly the 
mean accident rate level of the entire depart- 
ment despite the disparity in job experience. 
Their rate, in fact, appears from the data to 
be exceeded only by those employees who are 
currently in the breaking-in stage of de- 
velopment. 


Summary and Conclusions 


The results obtained in this experiment 
seem to indicate that at least for these groups 
of men and for this particular type of opera- 
tion the effect of experience upon the fre- 
quency rate of accidents apparently is limited 
to a three to five month period of initial on- 
the-job performance. ‘This particular period 
of time may be termed a breaking-in period 
and it is characterized by a sharp decline in the 
number of accidents. Following this period 
there is a leveling off in accident rate through- 
out the employee’s work history. This rather 
level period may be considered to be normal 
expectancy. 

When the workers are given formal train- 
ing prior to actual job performance there is a 
considerable reduction in early accident fre- 
quency rate, which is manifested in lower 
initial accident frequency and also in what 
may be regarded as a faster developmental 
period in that the amount of time required 
for the trained work groups to level off at the 
normally expected frequency is significantly 
reduced. 

It would appear that age in this instance 
apparently exerts a greater influence upon 
accident rate than does experience once the 
breaking-in stage is passed. From the com- 
parisons made between the matched work 
groups it has been found that older workers 
tend to have fewer accidents than their 


younger co-workers. This appears to be true 
throughout the employee’s work history when 
similar groups are compared. Lower accident 
rates are remarkably characteristic of these 
older men from their earliest job performance 
on. 

It is the author’s opinion, although no con- 
clusive evidence is presented, that since age 
exerts the stronger influence upon accident 
frequency rate, beyond initial employment, 
it is necessary to explain accidents in part on 
the basis of immaturity of employees. Fur- 
thermore, the usually found reduction in acci- 
dent rate with increasing age and experience 
can also be attributed to some extent to the 
operation of a natural selection process which 
results in the weeding out of workers less fit 
for the job. It is also felt that little im- 
portance can be attached to the effect of ex- 
perience upon accident rate for periods other 
than that of initial employment particularly 
when the effects of age and the natural selec- 
tion process are eliminated. Proper training 
in correct work methodology and safety 
habits can further reduce the effect of experi- 
ence upon accident rate but cannot apparently 
substitute completely for actual job perform- 
ance in helping the worker to internalize fully 
the correct procedures and habits necessary 
to efficient operation from the safety stand- 
point. 

Received September 28, 1953. 
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Note on Age and Productive Scholarship of a University Faculty 


Robert 


A. Davis 


George Peabody Gollege for Teachers 


The results presented are a part of a larger 
study conducted for the Council on Research 
and Creative Work of the University of 
Colorado in 1946. The study was designed 
to survey the research and writing activity of 
the entire faculty (representing all the schools 
and colleges) during a twenty-year period, 
1920-1939 inclusive. This is a period that 
we believed would reflect trends between two 
major wars—a relatively stable period in the 
history of the university. 

During the period covered by the study 
faculty members had been requested annually 
to submit to the Dean of the Graduate School 
a list of papers, articles, and books written 
during the year just ended; and these items 
were published annually in the Graduate 
Bulletin. In order to safeguard accuracy the 
author sent each faculty member a list of his 
contributions as recorded in the Graduate 
Bulletin and requested that they be checked. 

The terms research and writing should be 
noted. No effort was made to differentiate 
between items that were definitely of research 
character and those that were scarcely more 
than descriptive or expository documents. 
Also attention is called to the term activity. 
The study did not deal with the difficult prob- 
lem of appraising contributions of faculty 
members. Instead, it was concerned exclu- 
sively with the amount of research and writ- 
ing completed. 

The data reported here concern only one 
aspect of the larger study, that of research 
and writing in relationship to the age of the 
faculty member at the time. During the 
period covered by the study any person con- 
tributing one item was regarded as writing. 
Co-authors were treated in the same manner 
as authors writing independently. In cases 
of multiple authorship each person received 
the same credit that he would have received 


as a single contributor. The curves show 
absolute and not proportionate numbers of 
contributions. Consequently, they do not 
make allowances for the diminishing numbers 
of potential contributors at the upper age 
levels. Figure 1, which is based on the rec- 
ords of 385 faculty members, tells the story. 

The results suggest a number of questions. 
How do research and writing activity relate 
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to the age at which a faculty member attains 
full professorial status? How do they relate 
to salary increases and promotional policy in 
general? What should be the policy of a 
university administration regarding research 
and writing? What means may be used to 
stimulate research? Other kinds of profes- 
sional growth? If faculty members as a 
group reach a peak in research and writing 
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activity around 45 years of age is there evi- 
dence that they continue to grow profession- 
ally in other respects? Is there any funda- 
mental reason for the peak of activity around 
45 years of age? Is this a crucial period in 
the career of a faculty member? The reader 
will think of many other questions. 


Received October 16, 1953. 
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Relationship of Employee Morale to Ability to Predict 
Responses * 


Rossall J. Johnson 


School of Commerce, Northwestern University ” 


This investigation is concerned with the 
relationship between the morale of an em- 
ployee and the ability of the employee to 
predict the responses of his subordinates and 
the morale of these subordinates. 

There has been some evidence (1, 4, 5) to 
indicate that where individuals “knew” and 
understood one another they were able to 
predict the others’ responses. It would seem 
to follow from this that where a group and 
leader relationship existed, the ability of the 
group members to predict the leaders’ re- 
sponses would be dependent upon how well 
these group members understood their leader. 
And conversely, the ability of the leader to 
predict the group members’ responses would 


be dependent upon how well the leader under-. 


stood the group members. 

This problem may be clarified by asking 
three questions. 1. Do subordinates with 
high morale “know” and understand their 
supervisor better than low morale subordi- 
nates? 2. Is the morale of the subordinates 
who “know” and understand their boss higher 
than those who do not “know” and under- 
stand their boss? 3. Does the supervisor 
“know” and understand the high morale sub- 
ordinates better than the low morale sub- 
ordinates? If the answers to questions 1 and 
2 are yes, then one may anticipate the de- 
velopment of a questionnaire which will in- 
dicate morale by measuring the ability of the 
subordinate to predict the responses of his 
supervisor. An affirmative answer to question 
3 would indicate that the morale of the sub- 
ordinates may be estimated by measuring the 
ability of the supervisor to predict the re- 
sponses of his subordinates. 

In order to analyze this problem, the fol- 
lowing null hypotheses were set up: 


1 This paper was presented at the MPA annual 
meeting, Columbus, Ohio, April 30, 1954. 

2 This is part of a doctoral dissertation done un- 
der the direction of Dr. H. H. Remmers of Purdue 
University. 


1. There is no significant difference be- 
tween the morale scores of subordinates who 
can predict the responses of their supervisors 
best, and the morale scores of subordinates 
who have the least success in predicting the 
responses of their supervisors. 

2. There is no significant difference be- 
tween the ability of high morale subordinates 
to predict the responses of their supervisors 
and the ability of low morale subordinates to 
predict the responses of their supervisors. 

3. There is no significant difference be- 
tween the morale scores of individual sub- 
ordinates whose responses were most success- 
fully predicted by their supervisors and the 
morale scores of individual subordinates whose 
responses were least successfully predicted by 
their supervisors. 


Procedure 


A sample of 227 subordinates and 25 supervi- 
sors was taken from two companies. The sub- 
ordinate, for the purpose of this study, is desig- 
nated as a randomly selected hourly paid worker 
who does not have group leader responsibility 
and who has worked for the tested supervisor for 
at least nine months. The supervisor is defined 
as a salaried supervisor who has at least 12 sub- 
ordinates (as defined above) reporting directly 
to him. This supervisor should have supervised 
these 12 subordinates at least nine months. 
Eight to 10 subordinates under each of the 25 
supervisors participated in the project. 

Three scores were calculated from the ques- 
tionnaire: (1) subordinate morale score; (2) 
supervisor predicting score; and (3) subordinate 
predicting score. The subordinate morale score 
is the number of times, out of a possible 20, that 
the subordinate selected the most favorable re- 
sponse. The supervisor predicting score is the 
total number of times the supervisor correctly 
predicts the subordinate’s response to 20 ques- 
tions. 

The subordinate questionnaire consisted of 
three parts. Part A was a selection of 20 ques- 
tions from form A of the test How Supervise? 
These questions were from the sections on super- 
visory practices and supervisory opinions. As 
shown in the example, the question mark or un- 
decided response alternative was omitted. Part 
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B consisted of 20 morale questions. In a previ- 
ous study (2) these questions had D values (3) 
of 1.10 or higher. Part C consisted of the same 
questions as in part A but with instructions to 
predict the response that the subordinate thought 
his supervisor would give to each question. 

The subordinate was guaranteed anonymity. 
His name and personal data were on a separate 
sheet deposited in a ballot type of box while the 
questionnaire with the supervisor’s name only 
was deposited in another box. This question- 
naire and the personal data sheet were later 
brought back together by means of a code. 

The supervisor questionnaire consisted of two 
parts. Part A is made up of the same 20 ques- 
tions from form A of How Supervise? as were 
used in the subordinates’ questionnaire. The su- 
pervisor answered these questions as he would if 
he were answering the complete form, except that 
the question mark or undecided response alterna- 
tive was omitted. Part B also consisted of the 
same 20 How Supervise? questions and 20 morale 
questions mentioned above. With part B the 
supervisor was given a list of names of the sub- 
ordinates who filled in the subordinate question- 
naire. A maximum of ten names was on the 
list. Each man had a code number assigned, 
such as John Jones—No. 1, Bill Smith—No. 2, 
Ted Green—No. 3, etc. The supervisor was 
asked to predict how each subordinate answered 
the 40 questions. For example: when the fore- 
man had decided how John Jones had answered 
a question, he wrote the number “1” after the 
predicted answer. He then predicted the re- 
sponses of subordinates Bill Smith, Ted Green, 
and so on in the same manner until he had pre- 
dicted the eight or ten subordinates’ responses to 
all of the questions. 


Results 


Supervisors predicted the responses of 25% 
of the subordinates with scores of 14 or 
higher. This constituted the high group. 
Supervisors predicted the responses of 23% 
of the subordinates with scores of 10 or lower. 
This made up the low group. A t test was 
made to determine if there was a significant 
difference between the mean morale score of 
the high group and the mean morale score of 
the low group. 

As shown in Table 1, the hypothesis that 
there is no significant difference between the 
morale scores of individual subordinates 
whose responses were most successfully pre- 
dicted by their supervisors and the morale 
scores of individual subordinates whose re- 
sponses were least successfully predicted by 
their supervisors was not rejected. The t 


Table 1 


Comparison of Mean Subordinate Morale Scores for 
High-Low Supervisor Predicting Scores 


Supervisor Predicting 
Scores 


High 25% Low 23% 





Mean Subordinate 
Morale Score 

t value 

Significance Level 


12.3 11.8 


value indicates that there evidently is no sig- 
nificant difference between the means of the 
individual morale scores. 

Nineteen per cent of the subordinates had 
scores of 17 or higher on the morale survey 
questions. This group was considered the 
high morale group. Twenty per cent of the 
subordinates had scores of 8 or lower. This 
group constituted the low morale group. The 
subordinate individual predicting scores on 
the 20 questions of form A of How Super- 
vise? for the high morale group were added 
and a mean score obtained. 

The mean score for the low morale group 
was obtained the same way. A t test was 
made to see if there was a significant differ- 
ence between the mean of the high morale 
group and the mean of the low morale group. 

Table 2 shows that the hypothesis that 
there is no significant difference between the 
ability of high morale subordinates to predict 
the responses of their supervisors and the 
ability of low morale subordinates to predict 
the responses of their supervisors was re- 
jected. The t value indicates that the high 
morale subordinates’ mean score was sig- 


Table 2 


Comparison of Mean Subordinate Predicting Scores 
for High-Low Subordinate Morale Groups 


Morale Scores 


High19% Low 20% 


Mean Subordinate 
Predicting Score 12.6 98 

t value 13.8 

Significance Level 1% 
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nificantly higher than the low morale sub- 
ordinates’. The difference in the means was 
significant at the 1% level. 

The mean of the subordinates’ morale 
scores for the subordinates who were most 
successful in predicting their supervisor’s re- 
sponses was tested by the t test method to 
see if it was significantly different from the 
mean of the subordinates’ morale score for 
the subordinates who had the least success in 
‘predicting their supervisor’s responses. Sub- 
ordinates who predicted the responses of their 
supervisors most successfully were those with 
prediction scores of 14 or higher. This group 
represented the top 24%. Subordinates who 
predicted the responses of their supervisors 
least successfully were those with predicting 
scores of 9 or less. This group constituted 
the bottom 20%. ‘The mean of the subordi- 
nate individual morale score for the subordi- 
nates who were most successful in predict- 
ing their supervisor’s responses was tested by 
the t method to see if it was significantly dif- 
ferent from the mean of the subordinates’ 
individual morale score for the subordinates 
who had the least success in predicting their 
supervisor’s responses. 

The hypothesis that there is no significant 
difference between the morale scores of sub- 
ordinates who can predict the responses of 
their supervisors best and the morale scores 
of subordinates who have the least success in 
predicting the responses of their supervisors 
was rejected. Table 3 shows that the mean 
morale score of the subordinates who had 
high individual predicting scores was sig- 
nificantly higher than the mean morale score 
of low predicting subordinates. This differ- 


Table 3 


Comparison of Mean Subordinate Morale Scores for 
High-Low Subordinate Predicting Scores 








Subordinate Predicting 
Scores 





High 24% 
13.4 


Low 20% 
10.2 





Mean Morale Score 
t value 
Significance Level 


3.64 


Rossall J. 


Johnson 


ence as shown by the t value was significant 
at the 1% level. 


Discussion and Conclusions 


Based on these data, the following con- 
clusions may be drawn. 

1. It cannot be generalized as to the 
morale state of the subordinate and the 
ability of the supervisor to predict his re- 
sponse. The supervisor is evidently able to 
predict the responses of some low morale sub- 
ordinates with as much skill as some high 
morale subordinates. The non-rejection of 
hypothesis 1 might be explained by the fact 
that some low morale subordinates vociferate 
their objections or criticisms of certain things. 
Because they expressed themselves forcefully, 
the supervisor remembers these comments 
and is thus in a better position to predict the 
low morale subordinate responses than he is 
able to predict the responses of those who 
have average morale. 

2. High morale subordinates are better pre- 
dictors of their supervisors’ responses than are 
low morale subordinates. 

3. The subordinates who could predict the 
responses of their supervisors best had higher 
morale than those subordinates who had the 
least success in predicting their supervisors’ 
responses. 

These last two conclusions may be inter- 
preted as meaning that those who were better 
acquainted with their supervisor had higher 
morale and those who had high morale 
“knew” and understood their supervisor 
better. 

In connection with conclusion No. 2 there 
was a possibility that high morale employees 
were assigning more of their own responses 
to the supervisor than were the low morale 
employees. If projection were taking place . 
more with high morale employees than with 
the low morale employees, the difference in 
their ability to predict responses might be 
more a measurement of differences in pro- 
jection. 

To investigate this phase, the answers 
marked by the employee on the How Super- 
vise? test were compared with the answers 
the employee predicted his supervisor made. 
The number of times the answer and pre- 
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diction differed were tallied. It was found 
by the t test method that the average number 
of responses which were different for the high 
morale group were not significantly different 
from the average number of answers which 
were different for the low morale group. 


Summary 


An analysis was made to see if supervisors 
are able to predict the responses of high 
morale subordinates more successfully than 
those of low morale subordinates. An analy- 
sis was also made to see if there was a differ- 
ence in morale scores of those subordinates 
who were able to predict their supervisors’ re- 
sponses most successfully and those who were 
least successful in predicting the supervisors’ 
responses. The results indicate that super- 
visors are not able to predict the responses 
of high morale subordinates with any more 
success than the responses of low morale sub- 
ordinates. The results also indicate that 


high morale subordinates are able to predict 
their supervisors’ responses better than low 
morale subordinates. 


Received July 2, 1954. 
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An Application of Rogerian Concepts to Nurse-Patient 
Relationships 


Lewis Bernstein * 


Veterans Administration Hospital, Denver, Colorado 


From the experience of teaching psychology 
to both student and graduate nurses, it has 
become increasingly apparent that psycholo- 
gists can contribute in an important manner 
to nursing education. ‘This potential con- 
tribution lies in the field of nurse-patient rela- 
tionships, an area in their preparation which 
the nurses and student nurses themselves find 
incomplete. One nurse put the problem in 
this manner: “From the beginning of our 
training, the idea of caring for the patient, 
rather than the illness, has been emphasized, 
but nowhere in our program do we have the 
opportunity to learn how to put this idea 
into practice.” Others have voiced a more 
specific need in such questions as: “I have a 
patient who has been in the hospital for two 
weeks and he has not yet had a visitor. He 
obviously feels uncomfortable and despond- 
ent during visiting hours. Is there anything 
I can do to make him feel better, or shall I 
ignore it?” “Patient X dies during the night 
and is removed from his room. On the fol- 
lowing morning, other patients on the ward 
ask about X’s whereabouts. Do we tell them 
the truth or use some subterfuge such as say- 
ing that he was transferred to another ward?” 

These, and similar incidents, represent situ- 
ations which nurses are called upon to handle, 
and for which they often feel unprepared. If 
nurses could understand the feelings expressed 
by their patients, and the motivation behind 
patients’ behavior, not only would they feel 
more comfortable in these situations, but a 
real contribution to patient care and recovery 
would result. 

In a study by Shields (6), schools of nurs- 
ing, public health agencies, individual nurses, 
and other nursing groups were asked to in- 

1We wish to express our appreciation to Miss 
Marie L. Brophy, R.N., Chief Nurse, Miss Mary 
Jane A. McCarthy, R.N., Assistant Chief Nurse, and 
Miss Ruby L. Roepe, R.N., Assistant Chief, Nursing 
Education, all of the Veterans Administration Hos- 
pital, Denver, Colorado, without whose interest and 


cooperation this study could not have been com- 
pleted. 


dicate, by means of a questionnaire, whether 
they thought that a basic nursing curriculum 
should provide learning experiences intended 
to develop certain qualities or abilities. One 
such quality was described as: “. . . a real 
belief in the essential worth of every human 
being and . . . the importance of communicat- 
ing this belief by attitudes and actions” (6, 
p. 12). This quality appears to be a direct 
translation of Rogers’ concept of reflection 
which he defines as “. . . trying to understand 
from the client’s point of view and to com- 
municate that understanding” (5, p. 452). 
Although a large percentage of nurses who re- 
plied to the questionnaire felt that this was an 
important ability, some of the comments of 
the respondents reflect a doubt that such a 
quality can be taught. The following com- 
ments are among those reported by Shields: 
“A person either has or hasn’t this quality. 
Shouldn’t be a nurse if she hasn’t it. Can’t 
be taught (supervisor of a visiting nurse as- 
sociation).” ‘This comes with maturity and 
cannot be taught (private duty nurse).” 
“Criminals too? (private duty  nurse).” 
“Tdealistic. Impossible. No one can really 
believe in the essential worth of every hu- 
man being (director of a school).” “Belongs 
in family teaching, not nursing education.” 

In view of such skepticism, it would seem 
worthwhile to determine empirically whether 
or not nurse-patient relationships using Rog- 
ers’ concept of reflection can be successfully 
taught. This study, then, proposes to test 
the general hypothesis that nurses’ skills and 
attitudes in interpersonal relationships can 
be modified in a significant fashion when the 
nurses understand the nature of the tech- 
niques they use, the attitudes which such tech- 
niques express or implement, and the feelings 
they generate in patients. 


Method 


A series of three pretests, to be described be- 
low, was administered to all staff and head 
nurses on duty at the Denver VA Hospital. 
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Each nurse drew a number which was used as 
identifying information for the tests, thus pro- 
viding personal anonymity. Upon completion of 
the pretests, it was announced that a hospital 
clinical psychologist was to conduct a course in 
nurse-patient relationships; that because of the 
size of the group, he would be able to work 
with only half of the nurses at one time. In 
order to obviate any feeling that some pref- 
erence might be operating in the selection of 
those to participate in the course, the selection 
was made by the use of random numbers, in the 
presence of the entire group, using the same 
numbers with which they identified their pretests. 
This procedure provided an experimental group 
(those selected at random to participate in the 
course), and a control group (those not se- 
lected).2, Table 1 indicates the degree of match- 
ing obtained by this randomization. 

The course with the experimental group began 
approximately two weeks following the adminis- 
tration of the pretests. Ten weekly sessions of 
two hours each were held. The course began 
with a presentation of basic techniques which 
nurses use in responding to patients, a discussion 
of the attitudes which these techniques express. 
and a discussion of how the patient might react 
to each of these techniques. For the remainder 
of the course, nurses brought to class incidents 
from their own ward experience. These inci- 
dents were reported on a form which stated the 
situation and, as nearly verbatim as possible, the 
conversation that took place between the nurse 
and the patient. Behavioral responses accom- 
panying the conversation were also reported. 
These incidents were discussed in terms of: (a) 
why did the patient behave in such and such a 
manner; what feelings was he really expressing 
by such verbalization and/or behavior; and (b) 
how could the nurse best respond in such a situa- 
tion. Many of the incidents were role-played, 
and the implications of the situation discussed 
by the group. An effort was made to conduct 
the course along the lines suggested by Rogers’ 
non-directive concepts (4, 5). That is, the in- 
structor tried to create an atmosphere in which 
participants in the eperiment could feel free to 
express all shades of opinion and criticism in the 
discussion of nurse-patient situations. 

Upon completion of this training, both groups 
(experimental and control) retook the tests ad- 
ministered before the course was given. 


Measurement Techniques and Hypotheses 


1. The Nurse-Patient Situation Test. This 
test is made up of 35 nurse-patient incidents, 


2 The tests were originally administered to 77 staff 
and head nurses. At the time of the posttesting, 59 
of the original group were available—30 in the ex- 
perimental group, and 29 in the control group. 
Eighteen subjects who took the pretests were either 
on leave or had resigned at the time of the post- 
tests. 


Table 1 
The Degree of Matching Between the Experimental 
and Control Groups Achieved by 
Random Selection 


Experi- 

mental Control 

(N=30) (N=29) 

Median age 33.7 32.7 

Mean no. years nursing experience 14.0 12.2 

No. of graduates of hospital schools 28 26 

No. of graduates of collegiate schools 2 3 

No. who have received degrees since 
graduation from hospital school 

No. of head nurses 

No. of staff nurses 

No. of medical nurses 

No. of surgical nurses 

No. of neurological nurses 

No. of psychiatric nurses 

No. of operating room nurses 

No. of central supply room nurses 


Item 


‘ 
6 
23 


wun 


mn 


11 
4 


—nNnNwunsoec 


2 
1 
1 


modified and adapted from Porter (3), together 
with five possible nurse responses to the state- 
ment of the patient. Each of the choices pur- 
ports to measure one of the following five cate- 
gories of response: E (Evaluative), H (Hostile),* 
S (Supportive), P (Probing), and U (Under- 
standing ). 

A sample situation from the test, with the re- 
sponse choices, is the following: 

I tell you I hate that doctor of mine. I hate 
him! I hate him! I ask him about my diag- 
nosis and he gives me the prush-off. Tells me 
a diagnosis hasn’t been made yet. Phooey! It 
makes me feel so terrible that I hate him so— 
especially when I have to count on him to get 
well. I—it worries me. 


E (a) This is something you'll certainly want 
to get straightened out. A good rela- 
tionship with your doctor is important 
for your recovery. You'll find he'll 
treat you better if you can make your- 
self have faith in him. 

You're certainly not acting very grown- 
up. These doctors know their business. 
You do an awful lot of complaining 
about something that you're getting 
free. 

I guess most patients go through a pe- 
riod when they don’t like their doc- 
tors. It’s really not at all uncommon. 
I hear that from most patients. But 
things eventually settle down. 


* Porter (3) used the Interpretive category in place 
of our Hostile category. In an independent study of 
nurses’ responses, we found the Hostile category 
used more frequently than the Interpretive, and that 
the few Interpretive statements made by nurses 
could be subsumed under the Hostile classification. 
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P (d) I think we ought to get at the root of 
that worry. Is there anything else 
your doctor has done to upset you be- 
sides not telling you your diagnosis? 

U (e) You’re concerned about how sick you 
really are, and it worries you not to 
know for sure what your doctor thinks. 


The above example will also serve to illustrate 
the definitions of the five categories. In the 
Evaluative response, the nurse has made a judg- 
ment of relative goodness of the patient’s feel- 
ings, and has implied how the patient ought to 
feel and what he might do. It would follow that 
the patient might not feel free to further ex- 
plore his feelings about his physician since the 
nurse has, in effect, indicated his feelings are in- 
appropriate. 

The Hostile response in the above illustration 
again indicates to the patient the inappropriate- 
ness of his feelings and, in addition, subjects him 
to ridicule by implying that he is immature, and 
that he must accept whatever treatment is given 
him since the service is free. 

Through the reassurance given the patient in 
the Supportive response, the nurse, in effect, de- 
nies that the patient has a problem, that he need 
not feel as he does. Although the denial of his 
feelings may preclude further discussion (leav- 
ing the nurse with the feeling that her reassur- 
ance has “worked”’), it does not usually change 
the patient’s feelings. 

The Probing response implies that the patient 
might profitably discuss the point further, that if 
the patient will only give her more information, 
the nurse will be able to provide the answer or 
solution to his problem. 

By means of the Understanding response, the 
nurse indicates that she is trying to understand 
the patient’s point of view, and to communicate 
that understanding to the patient. The patient, 
feeling “safe” in such a situation, feeling that 
whatever attitudes he has are permissible, may 
now feel free to further explore, and himself 
modify, his feelings toward his physician. Fur- 
thermore, the patient, feeling that the nurse is an 
understanding persen, may be generally more co- 
operative in other nursing procedures. 

A try-out of this test on a class of 47 nursing 
students at the University of Colorado School of 
Nursing indicates that the five categories of re- 
sponse are relatively independent, with very little 
overlap. Intercorrelations between each cate- 
gory of response with every other category 
yielded low negative correlations with the excep- 
tion of two non-significant low positive correla- 
tions. Furthermore, the test appears to be suffi- 
ciently reliable for use. Split-half reliabilities, 
correlating odd with even items, based upon the 
data of the 47 nursing students, are as follows 
for each category: Evaluative, .77; Hostile, .80; 
Supportive, .74; Probing, .88; and Understand- 
ing, .92. 

Hypothesis 1: That the differences between 
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the posttest and pretest scores for the experi- 
mental group will show significantly greater de- 
creases in all categories of response (except Un- 
derstanding, which will show a significantly 
greater increase), than for the control group. 

It is obvious that this Nurse-Patient Situation 
Test is a comparatively direct measure of the 
content of the course, but may not reflect a more 
basic change in underlying attitudes of the nurse. 
It was felt, therefore, that other independent 
measures of attitude change should be included 
in the test battery. 

2. The F-Scale. As one independent measure 
of more basic changes in attitudes, the F-Scale 
was included in our test battery. This scale was 
developed in an extensive study of the “authori- 
tarian personality,” and is described in detail 
elsewhere (1). This scale measures attitudes on 
a continuum ranging from authoritarian to demo- 
cratic, 

Hypothesis 2: That the difference between the 
pretest and posttest scores for the experimental 
group will show a significantly greater shift to- 
ward the democratic end of the scale than for 
the control group. 

3. The Memory Test. In this procedure, a 
lengthy case history, constructed from nurses’ 
notes on an actual patient, is read to the group. 
The items in the history can be classified as 
physical items (temperature, }'.0d pressure, diag- 
noses, laboratory procedures, medications, etc.), 
and psychological items (paticnt’s ward behavior, 
the degree of his dependency, his employment 
history, etc.). Immediately after the history is 
read, the subjects are asked to write down 
everything they remember. The score on this 
Number of physical items 100 


Number of psychological items 

A high ratio would indicate recall of more physi- 
cal items than psychological. The rationale for 
using this procedure is that the case history is 
too long for the subjects to remember every- 
thing; that what is remembered will be selective; 
and that the course in nurse-patient relationships 
will make the experimental group more sensitive 
to psychological factors in the patient’s history. 

Hypothesis 3: That the difference between the 
pretest and posttest ratios between physical and 
psychological items will show a significantly 
greater drop for the experimental group than for 
the control group. 


test is: 





Results 


Table 2 presents the pre- and posttest 
scores for both groups, and the confidence 
levels of the pre-post differences between ex- 
perimental and control groups. 

The following facts are evident in Table 2: 


1. The experimental group showed a sig- 
nificantly greater decrease in evaluative re- 
sponses than the control group. 





An Application of Rogerian Concepts 


Table 2 


Differences Between Pretest and Posttest Scores 











Experimental 


Test Pre 


Control 


Post Pre Post 





Nurse-Patient Situation Test : 
Evaluative responses 
Hostile responses 
Supportive responses 
Probing responses 
Understanding responses 

F-Scale 

Memory Test 


10.0 
1.0 
10.2 
9.7 
4.5 
99.7 
173.4 


2. Neither group showed a significant de- 
crease in hostile responses. The exceedingly 
small number of hostile responses by both 
groups (out of a possible total of 35) mini- 
mizes the importance of this category. 

3. The experimental group showed a sig- 
nificantly greater decrease in supportive re- 
sponses than the control group. 

4. The experimental group showed a sig- 
nificant decrease in probing responses. The 
control group showed a slight increase in 
probing responses, although this increase is 
not significant. 

5. The experimental group showed a sig- 
nificantly greater increase in understanding 
responses than the control group. 

These data support our first hypothesis: 
that the experimental group would show a 
significantly greater decrease in evaluative, 
hostile (not significant), supportive, and 
probing responses, and a significantly greater 
increase in understanding responses than the 
control group. 

6. The experimental group showed a sig- 
nificant shift in attitudes toward the demo- 
cratic end of the F-Scale. The control group 
showed a slight, but not significant, shift 
toward the authoritarian end of the scale. 
These data support the prediction in hy- 
pothesis 2. 

7. The experimental group showed a sig- 
nificantly lower ratio between physical and 
psychological items on the Memory Test. 
The control group showed a slightly higher, 
but not significant, ratio. These data sup- 
port the prediction in hypothesis 3. 


1.0 12.5 11.0 


5 1.7 1.4 
10.3 
8.2 
2.3 
113.0 
173.0 


3.0 
2.2 
28.3 
91.0 
144.1 


10.0 
9.5 
3.6 

116.1 
193.0 


Discussion 


As previously stated, the attitudes and 
skills which we hoped to convey to the experi- 
mental group are based upon the nondirec- 
tive concepts of Rogers (4,5). It was, there- 
fore, interesting to note that the group went 
through a process similar to that in a thera- 
peutic counseling situation. Early in the 
course, many negative attitudes were freely 
expressed. As these were accepted by the 
instructor, more positive attitudes began to 
appear. The class itself noticed the con- 
spicuous change in the “climate” of the 
course. 

The question may arise if any more was 
accomplished than to train these nurses to 
recognize an understanding response. But 
the accompanying changes in sensitivity to 
psychological and social factors in a pa- 
tient’s case history, and the less authoritarian 
scores achieved on the F-Scale, do suggest 
that more basic changes took place. At a 
later date we plan to test the relative perma- 
nence of these changes. Several of the par- 
ticipants in the course were asked to explain 
their lowered scores on the authoritarian 
scale. The consistent response was that dur- 
ing the course they learned to respect the 
feelings of others, that patients could par- 
ticipate in the solution of their own prob- 
lems, and that these attitudes could carry 
over to other spheres. 

In addition to the changes in test findings 
there is other evidence that more than con- 
tent was learned in the course. Nurse super- 
visors have reported that most nurses in the 
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experimental group are using these skills. 
Several of the group have requested addi- 
tional training along the lines offered in the 
course. Even more convincing were the dif- 
ferences in understanding of patients’ feel- 
ings noted between the incidents turned in 
for discussion early in the course and those 
submitted toward the end of the course. 

This study has demonstrated that nurse- 
patient relationships making use of Rogerian 
concepts can be successfully taught. It is not 
to be inferred that a course such as that de- 
scribed in this paper is all that is necessary. 
Ideally, such a course should be taught early 
in nurses’ professional education, and fol- 
lowed by appropriate ward supervision. The 
nurses represented in this study come from 
59 nursing schools in 22 states. Yet, our pre- 
test results indicate that very few had any 
meaningful preparation in this area. The 
study by Phillips and Agnew (2) indicates 
that the technique of giving understanding 
responses is “. . . considerably more than a 
simple extension of knowledge of interper- 
sonal relations possessed by any reasonably 
intelligent and emotionally mature person.” 
In other words, such skills and attitudes can- 
not be assumed to result from general nurs- 
ing experience; they must be taught. And, 
with the current emphasis on interpersonal 
relations in nursing, the method herein de- 
scribed appears to be one manner in which 
such teaching may be accomplished. 


Summary 


Two groups of nurses—30 in an experimen- 
tal group, and 29 in a control group—took a 
battery of three pretests. The Nurse-Patient 
Situation Test measured five categories of 
nurses’ responses to patients’ statements: 
Evaluative, Hostile, Supportive, Probing, and 
Understanding. The F-Scale measured social 
attitudes on a continuum ranging from au- 
thoritarian to democratic. The Memory Test 
measured the ratio of physical items to psy- 
chological items remembered from a lengthy 
case history. 

Following the administration of the pre- 
tests, the experimental group participated in 
a course in nurse-patient relationships. An 
effort was made to conduct the course along 
the lines suggested by Rogers’ nondirective 
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concepts. That is, the instructor tried to 
create an atmosphere in which participants 
could feel free to express all shades of opin- 
ion and criticism in the discussion of nurse- 
patient situations. 

Upon completion of the course, both the ex- 
perimental and control groups again took the 
series of tests described above, and the dif- 
ferences between the pre- and posttest scores 
were compared. On the Nurse-Patient Situa- 
tion Test, the experimental group showed a 
significantly greater decrease in Evaluative, 
Supportive, and Probing responses, with a 
correspondingly greater increase in Under- 
standing responses than the control group. 
No significant decrease in Hostile responses 
was demonstrated by either group. How- 
ever, the exceedingly small number of Hos- 
tile responses minimizes the importance of 
this category. 

The experimental group showed a signifi- 
cant shift toward the democratic end of the 
F-Scale, while the control group showed no 
significant change. 

The ratio of physical to psychological items 
on the Memory Test showed a significant de- 
crease for the experimental group, while the 
control group showed no significant change. 

It is concluded that nurses’ skills and at- 
titudes in interpersonal relationships can be 
modified in a significant fashion when nurses 
understand the nature of the techniques they 
use, arid the attitudes which such techniques 
express or implement, and the feelings they 
generate in patients. 


Received October 26, 1953. 
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The present study is a report on an ex- 
periment to evaluate: (a) the achievement 
of students in terms of outcomes desired from 
a human relations course; and (b) the rela- 
tive effectiveness of two methods of teaching 
in achieving these outcomes. The course is 
one segment of a curriculum for the training 
of medical administrative supervisors in a 
military school. The experiment was con- 
ducted because of the lack of consistency in 
the findings from previous studies (1, 3, 11, 
12, 13, 14, 17, 18, 19, 21, 22, 23). It is 
based in a general way on some of the pro- 
cedures used in a previous study conducted 
by Canter (6). The present study differs 
from Canter’s in that it compares two teach- 
ing methods, uses airmen rather than insur- 
ance company supervisors as subjects and in- 
corporates a greater number of measures than 
used by Canter. 


Questions Studied 


The present study was limited to a study 
of the following questions: 

1. Is the twenty-hour block of instruction 
in human relations sufficient to increase the 
achievement level of students? 

2. If changes are made by the instruction, 
what is the extent and direction of this change 
at the: (a) knowledge level; (b) attitudinal 
level; and (c) skill level? 

3. Is one of the two methods of instruc- 
tion more effective than the other for pro- 
ducing change in the student achievement 
level? 


General Procedure 


The study was designed to take advantage of 
the best experimental procedure possible with a 
minimum disruption of course work and of rou- 
tine generally employed in the conduct of the 
course. The over-all design was simply the pre- 


* This study was conducted while the author was 
on the staff of the Officer Education Division, Hu- 


man Resources Research Institute, Maxwell Air 


Force Base, Alabama. 


test, instruction, post-test design and is shown 
below in more detail by steps. 

Step 1—Pre-Test: All students in both the 
control and experimental groups were given all 
tests. 

Step 2—Instruction: Students were divided 
into three groups for instruction purpose. Ex- 
perimental group one received instruction by the 
instructor-centered method. Experimental group 
two received instruction by the student-centered 
discussion method. The third group was the 
control group and received only the technical in- 
struction given in the course but did not receive 
the instruction given in human relations. 

Students were selected for one or the other of 
the teaching methods on the basis of sociometric 
leadership ratings in a group performance test 
(see below for description of the test). Accord- 
ingly, those students from the first group to be 
administered the group performance test who 
were rated 1, 3, 5 were assigned to the instructor- 
centered method and those students who were 
rated 2, 4 and 6 were assigned to the student- 
centered method. Those students in the second 
group who were rated 1, 3 and 5 were assigned 
to the student-centered method and those who 
were rated 2, 4 and 6 were assigned to the 
instructor-centered method. This procedure was 
repeated until assignments had been made for 
all individuals. Those individuals who were as- 
signed to the student-centered method were fur- 
ther sub-divided into sections of six people each 
for the actual instruction. These sub-groups 
were formed on the basis of a random sampling 
design which would assure that none of the in- 
dividuals who were in the test groups worked to- 
gether during the course. This procedure was a 
caution which assured the experimenter that one 
method would not have an advantage over an- 
other method on the group performance test as 
a result of an informal structure which might de- 
velop over a period of time during the formal 
course work. 

Step 3—Post-test: All students in the experi- 
mental and control groups were given all tests. 
The testing situations and schedules were exactly 
the same for each individual on both the pre- and 
post-test. 

During the course of the experiment two ob- 
servers were placed with the class given instruc- 
tion by the student-centered discussion method 
and one observer was placed with the class given 
instruction by the instructor-centered method 
These observers checked on: (a) the extent to 


329 





330 


which instructional content material varied be- 
tween the two classes and (b) the extent to 
which the instructor’s approach was consistently 
oriented to the instructional method he was rep- 
resented as using. Students in each class were 
also required to describe the instructional pro- 
cedure through the use of a check list. 


The Criteria 


The measures for criterion purposes were se- 
lected on the basis of the following requirements: 


1. The test should measure some aspect of hu- 
man relations ability or of leadership ability as 
established in previous studies. 

2. The test should measure some aspect of 
school objectives. 

3. The test should be dependable in its meas- 
urement properties. 

4. The test battery should represent measures 
of human relations or leadership ability at the 
knowledge, attitude and behavioral levels (see 
below for fuller descriptions of these levels). 

5. The tests should represent measures of ob- 
jectives desired in the course. 

The tests, classified according to level of meas- 
urement, are listed and described briefly below.’ 
Knowledge tests. What facts about human re- 
lations and leadership does the student know? 

1. Personnel relations test (20). Developed 
by the Air Force’s Human Resources Research 
Center for use with personnel in administrative 
positions. Measures what the student knows 
about supervisor-subordinate relationships. 

2. “How Supervise?” test (7). 

Attitude tests. How does he feel about certain 
kinds of leadership orientation? 

1. Problems of the Non-commissioned Officer 
in Charge (4). A set of five scales developed 
and validated at Harvard University. Measures 
orientation toward discipline (severe—not severe) ; 
assessment of promotion practices (perceives 
much wrong-perceives little wrong); and han- 
dling of informal pressures in organization (try 
to satisfy these pressures—ignore pressures). 

2. Leadership Opinion Questionnaire (8). Con- 
tains two scales. One measures orientation to- 
ward initiating structure in working with sub- 
ordinates and aggressive directing of subordi- 
nates toward achieving the goal. The second 
scale measures the extent to which the super- 
visor is considerate of the feelings of those un- 
der him. Developed at Ohio State University. 
Skill tests. The skill tests were used in an at- 
tempt to measure how the individual behaves in 
a realistic situation. These are divided into /n- 
direct measures and Direct measures of behavior. 

The indirect measures are paper and pencil 


1 Only the non-commercial tests are described. De- 
tailed descriptions of the commercially available tests 
may be found in the references provided (5, 9). 
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tests which establish situations for the respond- 
ent and require him to react to these situations. 
The advantage of this type of measurement is 
that it is possible to present a variety of situa- 
tions to the respondent. 

The direct measures are actual situations 
wherein the individual reacts to a realistic prob- 
lem in conjunction with other individuals. 


Indirect Measures: 


i. Social intelligence test (16). 

2. Prediction of human reactions test (15). 
Developed for the Detroit Edison Company. It 
was revised for this experiment to be adaptable 
to the airman population. Primarily oriented to- 
ward judging how an individual would react 
given certain characteristics of individuals and 
circumstances which might occur in a supervisor- 
subordinate relationship. 


Direct Measures: 


Students were assigned to a group composed 
of six airmen. They were told that they were to 
constitute a board to act on a morale problem 
occurring in a hospital staff. While acting as a 
board they were observed and rated by tech- 
nicians trained for this kind of observation. 
Scoring was accomplished by using Bales’s (2) 
interaction scoring form and by rating individu- 
als according to the four roles of leadership ac- 
tivity, ability, likability and contribution of best 
ideas. Ratings on the four roles were also made 
by examinees at the close of the session as well 
as by observers. Sixteen groups in all were used. 


Instructors and Instructional Method 


The instructors were selected because of their 
ability to use one of the methods. Each felt his 
competence was greatest in the method he was 
to use and was selected by his colleagues and 
superiors as being the most competent in that 
method of instruction. (Had it been possible, 
the experiment would have been replicated re- 
versing the roles of the two instructors. How- 
ever, before the next course began one of the in- 
structors was separated from the service.) 

Both instructors were carefully briefed on the 
content of the course. Each was warned not to 
emphasize content beyond that provided in the 
lesson plans. Observers were used to assure that 
= regents remained within the content pro- 
vided. 

Decisions were made between both instructors 
and the experimenter as to how the instructional 
methods were to differ. Both observers and stu- 
dents rated the instructional methods on a check 
list of descriptive items. Items which, by item 
analysis using the chi-square statistic, were found 
to discriminate significantly (p< .05) between 
the two methods of instruction are summarized 
qualitatively below. These items serve to de- 
scribe the conduct of the teaching method as 
perceived by the students. 





Approaches in Teaching a Human Relations Course 


Table 1 


A Comparison Between Pre-Post-Test Scores for Control and Combined Experimental Groups 








Control Experimental 
N = 24 N = 94 


Criterion Measure Mean S.D. Mean S.D. 





Personnel Relations Test 24.1 4.8 iz 5.6 
25.3 5.0 5.2 
How Supervise? Test 
Total a 58.5 10.7 
64.4 11.1 
6.61** 
Supervisory Practice 12.3 ‘. 11.8 
12.3 
1.41 
Company Policies 


Supervisory Opinion 


NCOIC 
Promotion-Orientation 


Assessment of Rewards 
Informa] Pressures 
Discipline-Justice 
Discipline-Initiative 
Leadership Opinion Questionnaire 
Initiating Structure 
Consideration 
Social Intelligence 
Judgment in Social Situations Pre 


Post 
t 


Observation of Human Pre 
Behavior Post 


Prediction of Human Reactions 


2.97** 


* =p .05 or <.05. 
** = p 01 or <.01. 
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Method A 
Instructor-Centered 


Suggestions were evaluated by instructor who 
advised or led class to correct conclusion. 

Techniques and steps for activities were given 
by the instructor. 

Instructor (rather than student) considers and 
handles individual problems and questions. 

The instructor is the focus of attention. Stu- 
dent to student attention happens rarely or oc- 
casionally. 


Method B 
Student-Centered Discussion 


Instructor encouraged suggestions and used this 
procedure to stimulate class to carry out class 
activities themselves. 

Techniques and steps for activities emerged 
from the group discussion. 

Group consideration of individual problems is 
encouraged by the instructor. 

The instructor is the focus of attention when- 
ever the discussion or activity needs guidance or 
information; otherwise students directed their 
attention to one another. 


Results 


Only the results obtained from a study of 
the written tests and of the sociometric data 
are reported here. 

The first hypothesis tested was that the 


course, regardless of method of instruction, 
produced an improvement in student achieve- 
ment level. A comparison of pre and post 
scores for the control group with pre and 
post scores for the group receiving instruction 
was made for each of the tests. This com- 
parison is shown in Table 1. 

Significant (p < .01) changes were made by 
the course segment in the knowledges related 
to human relations and leadership skills. 
These changes are reflected in the Personnel 
Relations and total How Supervise? test 
scores. The changes in the How Supervise? 
sections on company policies and supervisory 
practices were not significant although these 
tests also measured knowledge. An inspec- 
tion of these tests indicates, however, that the 
content area is more appropriate to an in- 
dustrial situation and would not be covered 
in a military course. 

Important changes were also made in stu- 
dent attitudes. The most significant change 
in the attitude area is reflected in the pre 
and post test scores of students on the How 
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Supervise? section on supervisory opinion and 
on the Leadership Opinion Questionnaire sec- 
tion on consideration for others. Another 
change is noted in the NCOIC Problems. 
Student responses indicated a more lenient 
attitude toward problems of discipline and 
promotion after having attended the course 
than they did prior to attendance. It is in- 
teresting to note that all groups (including 
the control group) made significant changes 
on the Leadership Opinion Questionnaire sec- 
tion on initiating structure. This change was 
in the direction of a less favorable attitude 
toward active directing and structuring of 
situations in which leadership might be dem- 
onstrated. Undoubtedly some of the change 
occurred as a result of practice effect; how- 
ever, the implication might be made that this 
change has occurred as a result of being in 
the school setting. An interesting hypothesis 
is stimulated here that the informal school 
setting, wherever it may be, may have detri- 
mental effect on attitudes toward active ini- 
tiation of structure. It is doubtful that such 
a change is more than a temporary one, al- 
though this hypothesis, too, should be a sub- 
ject for further investigation. 

In the area of indirect measurement of hu- 
man relations skills the course, in general, 
effected significant changes. Students taking 
the course made significant gains on the Social 
Intelligence test section on judgment in social 
situations, and the Prediction of Human Re- 
actions test. All groups (including the con- 
trol group) made significant gains on the 
Social Intelligence test section on observa- 
tion of human behavior. 

The pre-post correlations for control and 
experimental groups on each of the measures 
are shown in Table 2. Although the reliability 
of the measures used here was available 
from previous studies using these instruments, 
it was desired to obtain some indication of 
reliability when used with our subjects. The 
pre-post correlations for the control group re- 
flect a measure of the test-retest reliability. 
These correlations are shown in Table 2 
with similar correlations for the experimental 
group. Five of the measures had pre-post 
correlations of .75 to .81; five measures, cor- 
relations of .62 to .68; three measures, cor- 
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Table 2 


Pre-Post Correlations for Control and Experimental 
Groups on Each of the Tests 








Group 





Experi- 
Control* 


mental 
N= 24 N=94 


Personnel Relations Test 75 74 
How Supervise? Test 
Total 719 
Supervisory Practice 68 
Company Policies 62 
Supervisory Opinion 76 
NCOIC Problems 
Promotion-Orientation 34 
Assessment of Rewards 61 
Informal Pressures 29 
Discipline-Justice A9 
Discipline-Initiative 51 
Leadership Opinion Questionnaire 
Initiating Structure 67 
Consideration 17 66 
Social Intelligence 
Judgment in Social Situations 81 90 
Observation of Human Behavior 3 87 
Prediction of Human Reactions 53 oda 


Test 








* The control correlations amount to a measure of 
reliability. 


relations of .49 to .53; and two measures had 
pre-post correlations of .29 and .34. 

Canter’s (6) research, in some respects, 
was similar to the present one. His study 
was conducted with supervisors of three large 
insurance companies. A control group was 
used but only the lecture discussion method 
was used in his study. The course was the 
same length (20 hours) as the one used in 
the present study and the content was similar. 
A comparison of the results of both studies, 
on tests appearing in both studies, is shown 
in Table 3.? 

The insurance company supervisors 
achieved higher average scores on both the 


2To reduce printing costs Tables 3, 4, 5, and 6 
have been deposited with the American Documenta- 
tion Institute. Order Document 4323 from the ADI 
Auxiliary Publications Project, Photoduplication 
Service, Library of Congress, Washington 25, D. C., 
remitting in advance $1.25 for 35 mm. microfilm or 
$1.25 for 6 X 8 in. photocopies. Make checks pay- 
able to Chief, Photoduplication Service, Library of 
Congress 
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pre-test and post-tests than did the airman 
population, on both tests. However, the air- 
man population made a significant (p < .01 > 
.001) change in scores, as a result of the 
course, on the Prediction of Human Reac- 
tions test whereas a significant change was 
not reported for the insurance company su- 
pervisors. Similarly, the airman population 
made gains: (a) as greet °s the insurance 
company supervisors on ‘he “How Super- 
vise?” total score; and (b)} greater than the 
insurance company supervisors on the “How 
Supervise?” supervisory opinions score. On 
the other hand, Canter reports a significant 
change on the “How Supervise?” company 
policy score for the insurance company su- 
pervisors while the change for the airman 
population was not significant on this par- 
ticular test. 

Knowledges, Attitudes and Indirect Meas- 
ures of Skills. As was noted earlier in this 
report, one of the weaknesses of this part 
of the study was that it was impossible to 
reverse the roles of the two _ instructors. 


However, it is assumed that each of the in- 
structors was the best that could have been 
obtained for using the particular method. 


Students in each group rated their respec- 
tive instructors the same way with regard to 
interest of the instructor in the subject. 
They described their respective instructor 
as “being interested in the academic prog- 
ress of the students and interested in the 
students as individuals.” The gains for the 
experimental groups, contrasted with the 
gains for the control group, on each of the 
tests, are shown in Table 4. The F test was 
used for testing significance of differences in 
gains for all groups. Where a significant 
F was found, the t test was applied between 
groups. Significant differences were found by 
the t test to occur only between the experi- 
mental and control groups. 

In general, the evidence does not point to 
either method of instruction as being superior 
to the other. There was a general tendency, 
however, for the students taught by the lec- 
ture method to make greater gains on the 
knowledge and attitude tests than those stu- 
dents taught by the discus:ion method. 
These differences in gains made by the ex- 
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perimental groups were not significant sta- 
tistically. 

Leadership Skills. The measurement of 
leadership skills was conducted by placing the 
examinees in a simulated board meeting. 
The purpose of this meeting was to act on a 
morale problem which occurred in a hospital. 
During the meeting the examinees were rated 
by observers using Bales’s Interaction Process 
analysis. Students were provided twenty 
minutes to act on the problem and five addi- 
tional minutes for summarizing the discus- 
sion. After the meeting was over the re- 
spondents ranked one another from 1 through 
6 on leadership, guidance, and best ideas. 
The same procedure was followed on the 
post-test. (See the section on procedures for 
a further description of how individuals were 
assigned to sections. ) 

The intercorrelations between the socio- 
metric rankings are shown in Table 5. 

The change in rankings are shown in 
Table 6. This table is based on the average 
score of the individual. Those individuals 
with an average leadership score of 0 to 1.99 
were placed in category I (most leadership 
ability). Those with an average score of 
2.00 to 3.99 were placed in category II and 
4.00 to 6.00 were placed in category III (low 
leadership). 

There was a significant change (p > .02 < 
.O5) via the chi-square test of significance in 
the pre-post lecture group on leadership. 
This difference is attributed largely to the in- 
crease in category I individuals and the de- 
crease in category II individuals. Five peo- 
ple were rated as I (high leadership) before 
instruction and 13 after instruction. 

A comparison of these two tables, however, 
shows some other trends which, although not 
significant, are worthy of consideration. For 
the discussion group individuals originally 
assigned to the middle category (N = 27), 
there was very little movement out of this 
category. There was, however, a considera- 
ble movement of the discussion people origi- 
nally assigned to category III. Approxi- 
mately 50% of these individuals improved 
their leadership scores on the post-test. This 
movement does not occur for students in the 
lecture group. 


Francis J. Di Vesta 


Summary 


1. A 20-hour block of instruction in hu- 
man relations, as taught in a course for air- 
men made a significant change in student 
performance. Students, taken as a body, 
showed significant gains in achievement (as 
measured by the pre-test results compared 
with the post-test results) on the following 
tests: (a) Personnel Relations Test; (b) 
How Supervise? (Total Score); (c) How 
Supervise? (Supervisory Opinions); (d) 
Leadership Opinons (Consideration Score); 
and (e) Social Intelligence (Judgment in 
Social Situations). 

2. Furthermore, as measured by certain of 
these same tests the course was as effective 
as a similar course given to the supervisors 
of three large insurance companies. Students 
in the Medical Administrative Supervisor’s 
course made gains as great as the insurance 
company supervisors on the “How Super- 
vise?”’ test. 

3. The use of the discussion method of 
teaching appeared to have a slight advantage 
over the instructor-centered approach in im- 
proving leadership ability. There was a 


strong tendency for students starting the 
course at a low leadership level to improve 
through the discussion method. This tend- 
ency did not exist for individuals taught by 


the instructor-centered approach. Students 
at the upper levels of leadership ability are 
not affected much by either method. This 
finding was necessarily based on a small 
number of people. It should be made clear 
that a tendency, not a clear-cut change was 
found. A replication of the experiment would 
provide more definitive data. 

4. As measured by the knowledge and at- 
titude tests, there does not appear to be an 
advantage in the discussion approach over 
the instructor-centered approach. Both meth- 
ods produced equally good results. It should 
be emphasized that this finding applies only 
to one way of using the discussion method. 
Other variations of the discussion method 
(or of the instructor method) could produce 
quite different results. 

5. There is a pronounced and significant 
change in student attitude in general, toward 
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initiating structure. Students, after being in 
school, tend to feel that initiating structure 
in group situations is less important than 
they did before the course started. This 
change appears to occur by virtue of being in 
the school situation and is not directly at- 
tributable to a particular teaching method. 
The evidence that this occurs as a result of 
being in a school situation is that this change 
occurred for each group including the con- 
trol group. Further research would be neces- 
sary if it is desirable to know whether this is 
a temporary change or a permanent one. It 
is doubtful that the change is permanent. 
Further research would also be required to 
yield answers about how to develop a school 
atmosphere that would promote positive at- 
titude toward initiating structure. 


Received October 29, 1953. 
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Vocational Interests and Socio-Economic Status 


John W. Gustad 
University of Maryland 


Prominent among the factors which are 
thought to influence the choice of an occu- 
pation is socio-economic status. This may 
include both the status accorded to the oc- 
cupation by others as well as the level of 
aspiration of the individual concerned. It is 
this latter, the level of aspiration of the in- 
dividual with respect to occupations, that is 
the concern of the present investigation. 

Early in his work, Strong (9) recognized 
the need for a measure of the status aspira- 
tions of individuals completing his interest 
blank. He accordingly developed a scale 
which he called Occupational Level (hence- 
forth referred to as OL). This was accom- 
plished by contrasting the item responses of 
laboring men with those of men in business or 
the professions earning over $2500 per year. 
It should be noted that at the time this scale 
was built, this figure represented the upper 
fifth of the income distribution in this country. 

Since its publication, a considerable amount 
of research has been done on and with OL. 
It has seemed promising as a measure of 
motivation, level of aspiration, or socio-eco- 
nomic status drive. Darley (2, p. 60) has 
called it “. . . a quantitative statement of the 
eventual adult level of aspiration.” Darley 
(2), Gustad (5), Kendall (6), Ostrom (7, 8), 
and Strong (9) have shown that OL has some 
relationship to success or staying power in 
college. 

In an extensive study of OL as a measure 
of drive, Barnett et al. (1) reported several 
interesting relationships. OL was found to 
correlate .44 with a self-rating of level of 
aspiration in one school, .04 in another; with 
a verbal level of aspiration measure, .26 in 
the first school, .18 in the second. Though 
the results were not entirely clear, it was con- 
cluded that there was some relationship be- 
tween OL and other measures of level of 
aspiration. Stewart, in the same monograph, 
concluded that “. . . the mother may have a 
greater influence on the development of voca- 


tional interests than has hitherto been as- 
sumed” (p. 17). It should be noted, how- 
ever, that the sample studied was composed 
of the sons of skilled workmen. 

Recently, Gough (3, 4) has developed two 
scales for measuring different aspects of socio- 
economic status. One is essentially a short- 
ened, more easily administere’ version of 
scales used to assess actual, « » ective status. 
The other attempts to get ai ‘1e individual’s 
level of aspiration regardless of his objective 
status. These will henceforth be referred to 
as objective and subjective status respectively. 


The Problem 


The present study was designed to answer 
two principal questions: first, how, if at 
all, do various interest groups differ in terms 
of socio-economic status, however defined; 
second, what are the relations among the 
various measures of status, all of which were 
designed to get at a common variable in 
different ways? 

The subjects were all men students in the 
junior classes of the colleges of Arts and 
Sciences and Engineering at Vanderbilt Uni- 
versity. Men were selected both because of 
the generally better understanding of their 
interests as well as for the fact that there is 
no OL key on the women’s form of the 
Strong. 

All subjects completed the Strong Voca- 
tional Interest Blank as well as the two scales 
devised by Gough. Interest blanks were 
scored for all thirty-nine occupational keys 
as well as for the three clinical keys, OL, In- 
terest Maturity, and Masculinity-Femininity. 
Interest profiles were sorted in accordance 
with the method outlined by Darley (2) into 
primary interest groups. Those subjects who 
had no primary patterns were retained for 
study as a separate group (N.P.). Twenty- 
six cases, approximately ten per cent of the 
sample, had more than one primary. Ex- 
amination of the profiles showed that in all 
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but four cases one primary might be con- 
sidered to be stronger or “more primary” 
than the other and was accordingly chosen. 
In the remaining cases, secondary and tertiary 
patterns were inspected and a judgment made 
in favor of one or the other primary in terms 
of the total configuration. Those areas in 
which the subjects had primary patterns were 
as follows: Biological Sciences, Physical Sci- 
ences, Sub-technical, Social Welfare, Business 
Detail, Sales, and Verbal-Linguistic. 

After L, tests indicated homogeneous vari- 
ances, analyses of variance were made for 
each status measure across all interest groups. 
Product-moment correlations among the sta- 
tus measures were also computed. 


Results 


The results of the analyses of variance are 
shown in Table 1. Of the three status meas- 
ures, only OL showed significant differences 
among interest groups. 

To investigate further the situation with 
regard to group differences on OL, tests of the 
significance of the differences between all 
means were made. These are included in 
Table 2. While there were scattered signifi- 


cant differences in several groups, the two 
groups which appeared to be most consist- 
ently different were the Sub-technical and 


Verbal-Linguistic. The OL scores of the 
former tended to be below average for the 
present sample while those of the latter were 
above average. These results are in close 
agreement with those reported by Strong (9) 


Table 1 
Analyses of Variance of Status Measures 


Across Interest Groups 











Status Measure 





Subjective 
Status 


22.83 
8.17 


31.00 
2.79 
>.05 


Variance 
Source OL 


397.96 
18.66 


Objective 
Status 


22.75 
11.48 


34.23 
1.98 
>.05 





Between 
Within 


Total 416.62 
F* 21.33 
P <.001 





* Degrees of freedom for all three measures were as 
follows: for Between, 7; for Within, 244; Total, 251. 
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who correlated OL scores with scores on in- 
dividual scales. 

Finally, the correlations among the three 
measures were computed. They were as fol- 
lows: OL and subjective status, .07; OL and 
objective status, .10; objective and subjective 
status, — .03. None of these was statistically 
significant. Gough (3) reported a correla- 
tion of .52 between his two scales in a sample 
of high school seniors. 


Discussion 


From the foregoing, it must be concluded 
that at least for the present sample there is 
independence among the three status meas- 
ures and only significant differentiation among 
interest groups in the case of OL. Several 
possibilities may account for these findings. 

In the first place, only OL was specifically 
built to differentiate among occupational 
groups, but even it was not directly related 
to specific interest groups or occupations. 
Yet Barnett e¢ al. (1, p. 13) say that “The 
OL scores may be hypothesized as reflecting 
the individual’s socio-economic goals in life.” 
Further, on p. 17, they say, “The OL score 
is so constructed that it should indicate the 
socio-economic level of an individual's inter- 
ests.” In many ways, the development of 
OL was quite similar to that of subjective 
status; both involve self-descriptions about 
preferences for activities, reactions, feelings, 
etc. 

Another possibility lies in the nature of the 
sample which in the present case was drawn 
from students attending a private, fairly ex- 
pensive, above average socio-economic status 
university. These men were for the most 
part preparing for jobs in the professions or 
in business management. This is in direct 
contrast to the sample used by Stewart (1) 
described above. There may have been a 
ceiling effect operating to restrict the range 
near the upper limit. Yet if this were the 
case, such an effect should presumably have 
operated on the other scales in the same way 
as on OL which did not happen. 

It may be that OL is a more specific-to- 
occupations kind of measure than the other 
two scales. This should be studied, but the 
manner of development of all three makes 
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Table 2 


Mean Differences on Occupational Level for All Interest Groups 








Interest Group 





Interest Nat. 
Group N Sci. 


Sub.- 
Tech. 


Soc. Bus. V b 


erb. 
Welf. Det. Sales Ling. N.P. 





Bio. Sci. 26 
Nat. Sci. 24 
Sub. Tech. d 54 
Soc. Welf. 20 
Bus. Det. 23 
Sales . 49 
Verb.-Ling. ; 9 
N.P. 47 


252 





* Denotes significant at or beyond the .05 level. 
** Denotes significant at or beyond the .01 level. 


this appear unlikely. The study cited above 
(1) is again pertinent. Gough’s objective 
status scale probably gives greatest weight 
to factors contributed by the father. If 
Stewart’s results may be accepted, the inter- 
est group in which the individual is finally 
found is more a function of maternal in- 
fluence. 

What is probably needed is more work on 
the nature and dimensions of vocational in- 
terests as well as on socio-economic status. 
There are some contingencies, for instance, 
which should be considered. An individual 
from a high status home might have what is 
for him a low status score and yet still be 
average or above. Similarly, another person 
from a low status home might have what for 
him is a very high status score and he too 
might be average. 


Conclusions 


1. Of the three status measures studied, 
only OL differentiated significantly among 
the interest groups. 

2. Study of the mean differences with re- 
spect to OL showed that those individuals 
with Sub-technical interests tended to have 
below average OL scores while those with 
Verbal-Linguistic interests tended to be above 
average on OL. 


1.53 7.04** 
$.51”° 1.35 — .10 


2.88** 1.43 — 1,58 
—3.11° 
—8.62** 
—4.46** 
—3.01* 


5.83** 18 
7.36 —1.35 
12.87 —6 86** 
8.71°* —2.70* 
7.26** —1.25 
4.25* 1.76 

6.01** 


—4.16°* —5.61** 


— 1.45 


3. There was no significant correlation 
among the three status measures in the pres- 
ent sample. 


Received September 25, 1953. 
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Permanence of Interests and Interest Maturity * 
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Many counselors in working with college 
and precollege youth use Strong’s Vocational 
Interest Blank. In using this blank, or any 
other interest inventory, they are concerned 
with the problem of the permanence of scores 
obtained on the blank. The Strong blank 
has a scale, Interest Maturity, which is used 
by many counselors as a measure of the prob- 
able stability of a counselee’s interest profile. 
They assume a positive relationship between 
stability of interests and Interest Maturity 
score. There is, however, very little or no 
evidence to support this assumption. For an 
account of how the Interest Maturity scale 
was constructed and the evidence for its re- 
lationship to permanence of interests, see 
Strong (4). The present study was designed 
to test whether or not scores on the Interest 
Maturity scale are related to interest stability. 


Method 


In 1949 the Vocational Interest Blank was of- 
fered on an optional basis to all high school 
seniors who participated in the state-wide testing 
program in the State of Minnesota. Approxi- 
mately 3500 senior boys completed the blank. 
These completed blanks were made available to 
the investigator by the Student Counseling Bu- 
reau at the University of Minnesota. 

A check was made of the University of Minne- 
sota enrollment in 1951 to determine how many 
of these boys were enrolled at that time. It was 
found that 331 boys who had completed the 
Strong in 1949 were enrolled. A sample of 206 
of these boys was contacted and asked to again 
complete the Strong blank; 182, 88 per cent, 
complied with this request. One subject omitted 
a number of items making his blank unusable so 
that tests and retests for 181 subjects were used 
in the study. 

The minimum time between test and retest 
was two years and the maximum time did not 
exceed 2.5 years. The mean age of the 181 sub- 
jects at the time of the retest was 19.8 years. 

The tests and retests for the 181 subjects were 


1This paper is based upon a portion of a Ph.D. 
thesis submitted to the graduate faculty of the Uni- 


versity of Minnesota. The author wishes to ac- 
knowledge the guidance of his advisor, Dr. Willis E. 
Dugan. 


scored for forty-four occupational scales and for 
Interest Maturity, Occupational Level and Mas- 
culinity-Femininity. To determine whether In- 
terest Maturity score was related to stability of 
interests, some measure of stability for the indi- 
vidual was needed. Kendall’s (2) coefficient of 
concordance, W, was used for this purpose. The 
coefficient W is based on the method of ranks. 
It is related to Spearman’s rho but has the ad- 
vantage of being appropriate for any number of 
observations, whereas rho is applicable only to 
two sets of data. In the present study, rho 
would also have been appropriate as only two 
sets of data were used. 

Coefficients of concordance were computed be- 
tween each individual’s test and retest profile. 
The forty-four occupational scales were used in 
computing this coefficient. 

The subjects’ interest profiles were arbitrarily 
divided into three groups of approximately equal 
size on the basis of their W values. Those with 
coefficients of concordance between test and re- 
test of .906 to .977 were designated as a “high” 
stability group (N = 60), those with concord- 
ance values of .820 to .905 were designated as an 
“average” stability group (N= 61), and those 
with concordance values of .419 to .818 were 
designated as a “low” stability group (N = 60). 
The Interest Maturity scores of these three 
groups on the first test, ie., the 1949 test, were 
then compared. 


Results 


The coefficients of concordance are of in- 
terest themselves as a measure of the sta- 
bility of individual Strong profiles. The 
range of coefficients was from .42 to .98 with 
a median of .87. All but fifteen of the 181 
coefficients were significantly greater than 
zero at the .01 level. Since W has a direct 
relationship to Spearman’s rho, these figures 
can also be expressed in terms of rho. The 
median rho would be .74. 

The means and standard deviations of the 
Interest Maturity scores for the “high,” 
“average,” and “low” interest stability groups 
are given in Table 1. Bartlett’s test for 
homogeneity of variance indicated that the 
variances were homogeneous (P > .05). An 
analysis of variance of the Interest Maturity 
scores (Table 2) showed that no significant 
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Table 1 


Means and Standard Deviations of the Interest Maturity 
Scores for Subjects with High, Average, and 
Low Coefficients of Concordance 


Stordahl 


Table 2 


Analysis of Variance of Interest Maturity Scores for 
Subjects with High, Average, and Low 
Coefficients of Concordance 





Mean S.D. 
46.7 8.8 
48.6 7.0 
46.5 8.9 


W value 
High (.906-.977) 
Average (.820-.905) 
Low (.419-.818) 





difference existed between the means of the 
three stability groups (P > .05). 

Although the Interest Maturity scores on 
the first test did not differ for the three sta- 
bility groups, the mean Interest Maturity 
score for the entire sample of 181 increased 
from 47.2 on the first test to 52.0 on the re- 
test. This increase was significant at the .01 
level. 

These results fail to substantiate the as- 
sumption of a positive relationship between 
interest stability and Interest Maturity score. 
From this, one may conclude that the pres- 
ent Interest Maturity scale is not useful as 
a means of estimating the probable stability 
of a precollege male’s interest profile. More 
useful would be a key built by contrasting 
the responses of persons whose interests re- 
main stable with the responses of persons 
whose interests do not remain stable over a 
period of time.’ 


Summary 


A sample of 181 males who had completed 
Strong’s Vocational Interest Blank as high 
school seniors were retested two years later 
as college students. Using Kendall’s coef- 


2The Student Counseling Bureau at the Univer- 
sity of Minnesota has begun work on such a key. 


Source df SS MS F P 


Between 2 159.73 79.865 1.177 
Within 179 =12142.09 67.833 


Total 181 12301.82 





>.05 





Note: Bartlett’s test for homogeneity of variance: 
chi square = 4.17; P > .05. 


ficient of concordance, W, as a measure of 
the relationship between the test-retest pro- 
files, coefficients were computed for each of 
the 181 pairs of profiles. When those in- 
dividuals with high (N = 60), average (N 
= 61), and low (N = 60) W values were com- 
pared with respect to Interest Maturity score 
on the first test, they were found to be homo- 
geneous. Thus, the results of this study do 
not support the assumption of a positive re- 
lationship between interest stability and In- 
terest Maturity score. 
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An awareness that prediction of college 
grades by means of “intellectual” factors had 
reached a point of diminishing returns stimu- 
lated extensive efforts to explore the predic- 
tive value of other personality variables. A 
number of authors (2, 3, 11) have reviewed 
these attempts. Thus far, the coefficient of 
alienation left by the present predictors has 
been little reduced. 

The Strong Vocational Interest Blank for 
Men (SVIB), because of the reliability of 
its scales and its wide usage, has been in- 
cluded in much of the research on academic 
achievement. Most of these studies have 
been summarized by Strong (10). A di- 
versity of designs, definitions of achievement, 
and instruments, often along with methodo- 
logical deficiencies, renders interpretation of 
results difficult. It does appear, however, 
that keys for the Strong (e.g., 12, 13) have 
been developed which can add to the predic- 
tion of college grades afforded by intelligence 
test scales. Yet, such scales fell into disuse 
through their failure to add to a predictive bat- 
tery which includes secondary school grades. 

In the authors’ (8, 9) preliminary investi- 
gation of personality variables associated with 
academic achievement, the different definition 
of achievement used appeared to warrant a 
re-examination of this factor’s relationship to 
SVIB. 

At present, the best available estimate of 
an incoming freshman’s grades at Yale Col- 
lege is his “general predicted score” (1). 
This measure is the dependent variable in a 
multiple regression equation of which the 
three independent variables are: (1) adjusted 
secondary school record; (2) Scholastic Apti- 
tude Test score (SAT); and (3) the total of 
three College Entrance Board Examinations 
(CEEB). Achievement was measured in 
terms of deviation from predicted score. 
Thus, unlike most previous studies, the in- 
vestigation was concerned with achievement 
beyond that predicted by a battery which in- 
cludes secondary school record. A recently re- 
ported study by Melville and Frederiksen (5) 
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on freshman engineering students at Prince- 
ton also used adjusted secondary school grades 
as one component of the predicted score. 

Though the results yielded by the first ex- 
perimental groups (Yale College classes of 
1950 and 1951) gave little promise that the 
Strong would have practical prediction value 
for academic achievement, the scales showed 
more than a chance relationship to achieve- 
ment status.’ Further, the significant results 
obtained for Group V scales and the Mascu- 
linity-Femininity (M-F) scale appeared to 
offer indirect support for a hypothesis de- 
veloped earlier (8, 9). For these reasons, 
SVIB was included in a battery administered 
to three other sets of experimental groups. 
Data were available for an additional group. 
These extensive replications permit an im- 
proved appraisal of SVIB as it relates to col- 
lege academic achievement. 

Subjects 

Selection of subjects was based on the rela- 
tionship between grades and predicted scores. 
The procedure, described elsewhere in greater 
detail (8, 9) was designed to yield three 
groups of Yale undergraduates who would be 
equated for predicted score, but who would 
differ widely in grades. The regression line 
of grades on predictions was drawn on a 
scattergram. Lines parallel to the regression 
line were drawn so as to cut off approximately 
the most extreme ten per cent of both the 
positive deviants from predicted score, over- 
achievers (O’s), and the negative deviants, 
underachievers (U’s). A third group, normal- 
achievers (N’s), included those students in cells 
cut by the regression line. This procedure 
was applied to each sample separately. Sub- 
jects included in four samples had accepted 
invitations to participate in the study and 
were tested in either the junior or senior year. 
In one sample, students were routinely ad- 
ministered the test during the freshman year.’ 

Table 1 indicates that the combined experi- 
mental groups do not differ in predicted scores 


1Strong scores for this group were obtained by 
J. R. Wittenborn. 
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Table 1 


Comparison of Groups on Prediction, Components of Prediction, and Average Grade 











School Grade 


Prediction Adjusted 


SAT 


Average 


Average 
CEEB 


Grade 





M .SD M “SD M 


SD M 


M SD 





sD 
76.9 +49 76.5 5.9 
7710 4.9 76.5 5.8 
774° 49 77.2 5.6 
77.1 5.0 76.7 5.8 
3 Jl 
CRuo 8 1.0 
CRwno 6 1.2 


58.7 
59.0 


or components of predicted scores, but differ 
significantly in academic average. 


Results 


The ability of SVIB occupational scales to 
separate the experimental groups is shown in 
Table 2. Comparisons among groups were 
made by means of chi-square with the cut- 
off point for each scale taken at the median 
of the total group. None of the 44 scales 
differentiated U’s from N’s. Overachievers 
differ significantly (p < .05) from U’s on 11 
scales, and from N’s on 12 scales, nine of 
these scales being the same. These significant 
differences are as follows: overachievers score 
higher than both other groups on scales for 
Artist, Psychologist, City School Superintend- 
ent, Minister, Musician and C.P.A. They 
also score higher than N’s on Mathematician, 
Group IT, and Group X. Overachievers score 
lowest on Sales Manager, Real Estate Sales- 
man, and M-F. In addition, they score lower 
than U’s on Aviator and Forest Service. 

A key® for each achievement group was 
developed from item analyses of two samples. 
Uncorrected odd-even reliabilities are .43 for 
the U key, .38 for the N key, and .42 for the 
O key. Results of the application of these 
keys to the remaining three samples combined 
are shown in Table 3. Results of subtracting 
each subject’s U score from O score (O-U 


2To reduce printing costs, these keys and the 
method used to select items for the keys, have been 
deposited with the American Documentation Institute. 
Order Document No. 4324 from the ADI Auxiliary 
Publications Project, Photoduplication Service, Li- 
brary of Congress, Washington 25, D. C., remitting 
in advance $1.25 for 35 mm. microfilm or $1.25 for 
6X8 in. photocopies. Make checks payable to 
Chief, Photoduplication Service, Library of Congress. 


57.8 
57.5 


5.6 71.4 
6.5 77.9 
57.7 6.1 84.3 
57.6 6.1 78.2 
J 17.1 
2 32.2 
eo 18.3 


3.5 
3.2 
3.4 
6.1 


score) are also shown. The U key and the 
O-U score yield significant differences between 
O’s and U’s, and O’s and N’s, whereas the O 
key differentiates only O’s from U’s. The N 
key produces no significant differences. None 
of the keys yields significant differences be- 
tween U’s and N’s. 

A measure of the congruency between 
stated occupation choice and Strong scores 
was available for 265 subjects. The per- 
centage of each group receiving A scores on 
the Strong in their occupational choice is: 
U’s, 36.7; N’s, 34.3; and O’s, 32.2. The 
differences among groups are not significant. 


Discussion 


Though a clear interpretation of results is 
hampered by the empirical nature of the in- 
strument, two aspects compel attention. The 
occupational scales apparently distinguish 
among the achievement groups with more 
than chance frequency and _ consistency.* 
Scoring keys, empirically developed from two 
samples, separated the remaining samples 
with statistical significance. Both of these 
events can be viewed as evidence that achieve- 
ment as measured in this study is not an 
artifact produced by the unreliabilities of 
either the predictors or grades. Further, evi- 
dence is supplied that there is a relationship 
between achievement status and responses to 
the Strong items. 

The ability of empirical scoring keys of 
low reliability to separate the experimental 
groups offers some promise for eventual de- 
velopment of keys which would add signifi- 

8 Since scale scores cannot be treated as independ- 


ent events, estimates of chance expectancy can only 
be approximately determined. 
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Table 2 


Achievement Status and Strong Vocational Interest Blank Scores 








Per Cent above Median 





Under- Normal-  Over- Chi-Squares Significant at .05 Level 
achievers achievers achievers — 
N=139 N=175 N=166 U vs.N U vs.O N vs. O 


Artist 46.0 43.4 5 ~ (3-2) 6.67 (5-0)! 10.35(4*-1) 
Psychologist 43.9 40.6 ; — (3-2) 9.38 (5*-0) 14.85(5*-0) 
Architect 53.2 46.3 53. — (3*-2) (3-2) (4-1) 
Physician 49.6 46.3 e —- (4*-1*) (3-2) ~ (4*-1) 
Dentist 43.8 46.9 53. — (2-3) ~ (4-1) — (4-1) 
Group I? 46.5 48.7 — (3-1) (4-0) - (3-1) 
Mathematician 48.9 42.3 — (41) — (41) 7.33(S*-0) 
Engineer 54.7 48.6 , — (4*-1) (5-0) (3-2) 

Chemist . 48.0 ) — (4-1) — (3-2) — (4-1) 

Group IP 3 41.9 h — (3*-1) — (3*-1) 6.39(3*-1) 
Prod. Mgr. 5. 51.4 ' = (3-2) —-(#1) £=x— (8-2) 
Aviator*® 55.2 52.1 ‘ — (3-1) 4.26(4-0) (3-1) 
Farmer* 53. 53.0 : - (1-3) — (3-1) (3-1) 
Carpenter* 50. 50.9 . — (2-2) — (3*-1) — (3*-1) 
Printer® 51. 49.6 J - (2-2) — (2-2) — (2-2) 
Math.-Sci. Teacher 5. 47.4 : - (4-1) ~ (4*-1) (3-2) 
Policeman? 51. 52.1 y — (2-2) ~ (3*-1) (3-1) 
Forest Service® 56. 50.4 3. - (40) (2*-2) 
YMCA Phys. Dir. ’ 46.3 53. - (3-2) - (3*-2) 
Personnel Dir. a 49.7 51. — (3-2) 3- - (1*-4) 
YMCA Secretary 5. 47.4 3. — (3-2) — (2-3) 
Social-Sci. Teacher 3. 48.0 54. — (3*-2) — (2*-3) 
City School Supt. > 46.3 ‘ — (4*-1) 11.80(5**-0) 7.87(5*-0) 
Minister 5. 45.7 ’ — (2-3) 6.67 (5-0) 7.83(4*-1) 
Group V 50.3 54. — (4*-1) — (4-1) — (2*-3) 
Musician 46.9 57. — (2-3) 4.22(5-0) 4.11(4-1) 


CPA. 434 59. — (2-3) §.42(3.5*-1.5)  8.66(3*-2) 





























Accountant : 53.1 52. — (3-2) — (4-1) - (2-3) 
Office Worker 50. 52.0 : — (4-1*) — (2.5-2.5) - (4-1) 
Purchasing Agent F 52.6 J — (3-2) — (4-1) — (4-1) 
Banker® ’ 56.4 : — (3-1) — (1.5-2.5) — (3-1) 
Group VITP 51. 52.1 — (3-1*) — (2*-2) —~ (3-1) 


Sales Manager — (3-2)8 3.95(5*-0) 4.47(4-1) 
Real Estate Salesman 54. 57. . — (3-2) 4.29(5-0) 7.61(5**-0) 
Life Insur. Salesman 50. — (2-3) ~ (4-1) - (3-2) 
Group TX? ; - (3-1) - (40) (3-1) 
Advertising Man . — (3-2) — (3°-2) — (2°-3) 
Lawyer . 57.8 - (3*-2) - (3*-2) — (4*-1) 
Author-Journalist 5 57.8 - (4-1*) - (3*-2) (3*-2) 
Group X* 3. 54.1 — (3-1) - (2**-2) 4.29(2**-2) 
President? R 46.6 — (2-2) - (3-1) - (3-1) 
Occupational Level? , 59.1 — (3-1) - (3*-1) — (2*-2) 
Masc.-Fem. . 57.7 40.1 — (3-2*) 8.23 (4*-1) 10.26(5*-0) 
Interest Maturity” . 48.7 55.6 — (3-1*) — (4-0) — (1*-3) 














! The first number in parentheses gives the number of times the direction of the results of the samples was 
the same as that of the combined group. The second number indicates reversals. Asterisks are added for each 
sample significant at the .05 level or .01 level or better. 

2 Comparisons based on four samples. Number of subjects: U = 114; N = 117; O = 133. 

* Normalachievers exceed underachievers in three samples. 
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Table 3 





Application of Achievement Keys to a New Sample 





Per Cent above Median 





Normal- 
achievers 
N=81 


Under- 
achievers 
N=86 


Chi-Squares Significant 


Over- at the .05 Level 


achievers - 
N=98 U-N 





U-O N-O 





59.3 
50.0 
38.4 
41.9 


U key 
N key 
O key 
O-U score 


51.9 
49.4 
49.4 
45.7 


cantly to the present predictors. However, 
the wide variation in the nature of the items 
comprising the key points to the difficulty of 
identifying variables by this approach. 

Original impetus for the inclusion of the 
Strong in the present diagnostic battery came 
from its earlier apparent support of a hy- 
pothesis developed elsewhere (8, 9). Stated 
briefly, it was hypothesized that the extent to 
which behavior favorable to high grades will 
persist at the college level will be a function 
of the degree to which certain moral and cul- 
tural values have been internalized—i.e., posi- 
tive deviation from predicted scores will be 
directly related to the phenomena variously 
labelled “superego,” “conscience,” “moral 
fiber,” “goodness,” etc. 

Table 2 shows that in V, the so-called 
“goodness” group, when the samples are com- 
bined, on no scale do half of the U’s exceed 
the median score. Correspondingly, more 
than half of the O’s exceed the median on all 
Group V scales. Further, on three of the 
scales, the O’s exceed the U’s in all five 
samples. These findings, along with the in- 
cidence of statistical significances, indicate a 
relationship of Group V scales to academic 
achievement. This relationship was also 
found by Melville and Frederiksen (5) and 
by Morgan (6). Though the results are in 
the direction which the hypothesis would pre- 
dict, it is still difficult to gauge the amount of 
support given to it. The instrument is em- 
pirical and any argument of support for the 
hypothesis must obviously involve a certain 
amount of tenuous reasoning. 

The earlier finding that O’s obtain lower 
M-F scores had been viewed as possible sup- 
port for the hypothesis. This finding was 
corroborated by results yielded by addi- 
tional samples. But, again, the difficulties in 
interpretation outlined above prevail. 


36.7 — 
39.8 
56.1 — 
65.3 — 


9.37 4.11 


5.78 
10.16 


6.94 


The Group V scales merit special atten- 
tion because of their possible measurement 
of a variable hypothesized to be related to 
achievement. Nevertheless, other occupa- 
tional groups show similar discriminatory 
ability. In addition to Group V, O’s score 
highest on occupational Groups I and X and 
also on the scales for Musician and C.P.A. 
Overachievers score lowest on occupational 
groups IV and IX. These findings seemingly 
do not bear on the authors’ present hy- 
pothesis. It does seem, however, that high 
scores on occupations requiring extensive 
academic training are positively related to 
achievement. 

Considerable agreement is found between 
our results and those of Melville and Frede- 
riksen (5). The lack of greater agreement 
(especially on groups II and IV) may be 
due to differences in subjects. Melville and 
Frederiksen tested freshmen engineering stu- 
dents whereas the bulk of the present study’s 
subjects were liberal arts students. Perhaps 
there are some personality or interest factors 
related to general academic achievement and 
others related to specific achievement. 

Kendall (4) and Ostrom (7) found a posi- 
tive relationship between achievement and 
Q-L scores. The differences in design be- 
tween these studies and the present one render 
direct comparisons difficult. Nevertheless, the 
results of the present study are in the same 
direction as those obtained by these authors. 

Common among educators is the assump- 
tion that many students fail to achieve be- 
cause of a disparity between occupational 
aims and measured interests. The results of 
this study fail to show that achievement is 
related to the congruency of occupational 
choice and scale scores. Morgan (6), using 
somewhat different criteria of achievement, 
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found a negative relationship between such 
congruency and achievement. 

A comparison among the experimental 
groups in the combined samples yields 23 
differences which are significant at less than 
the .05 level of confidence. Yet, none of 
these differences is obtained between U’s and 
N’s. Further, the empirical achievement 
keys, though able to distinguish the O’s from 
both other groups, are unable to separate 
the U’s from the N’s. This result may be: 
(a) an artifact produced by the instrument; 
(b) due to the curvilinear nature of the vari- 
ables related to achievement; or (c) pro- 
duced by the academic structure which places 
a premium on overachievement while elimi- 
nating the extreme underachievers. 

The suggestion of a curvilinear relationship 
between achievement and some variables has 
important implications for experimental de- 
sign.* A large portion of published findings 
in this area is based on two contrasted 
achievement groups. This implicit assump- 
tion of linearity may be unjustified. 


Summary 

The Strong Vocational Interest Bank for 
Men (SVIB) was administered to three 
groups of subjects (designated as under- 
achievers, normalachievers and overachiev- 
ers) who were equated for general predicted 
score, but who differed in academic achieve- 
ment. The blanks were scored for all oc- 
cupations and comparisons among the groups 
were made by means of the chi-square tech- 
nique. Empirical scoring keys were de- 
veloped from two samples and cross-validated 
against three other samples. 

1. The incidence of significant results ob- 
tained with both the occupational scales and 
the empirical keys was viewed as a demon- 
stration of some relationship between achieve- 
ment status and response to the Strong items. 

2. The discriminatory ability of the Group 
V scales was regarded as lending possible sup- 
port to the hypothesis that deviation from 
predicted grades is associated with a variable 
described as acceptance of or conformity to 
certain cultural values. 

3. In general, the Strong does not seem 
highly appropriate for the measurement of the 

4The author’s results with the Rorschach (9) 


also indicate a curvilinear relationship similar to 
that presented here. 


theoretical variable specified in the hypothesis. 

4. Congruency between stated occupational 
aims and interests as measured by the Strong 
does not appear to be related to academic 
achievement. 

5. Scale scores do not show a linear rela- 
tionship with achievement; the overachievers 
appear to be the discrete group. 

Received October 5, 1953. 
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Surprisingly few long-term follow-ups have 
been made on the Strong Vocational Interest 
Blank, when one considers that the test has 
now been in use two decades. ‘The largest 
studies are nine and ten year follow-ups re- 
ported in Strong’s original volume (21), which 
are later supplemented by a twenty year re- 
port (22) on the same group. Unfortunately, 
as Super remarks (24), “the data are not so 
organized as to show what percentage of men 
entered and remained in fields in which they 
made A, B+, or lower scores.” Instead, 
Strong (21) adduces support of four rather 
indirect propositions: 


1. Men continuing in occupation A obtain 
a higher interest score in A than in any other 
occupation. 

2. Men continuing in occupation A obtain 
a higher interest score in A than other men 
entering other occupations. 

3. Men continuing in occupation A obtain 
higher scores in A than men who change from 
A to another occupation. 

4. Men changing from occupation A to oc- 
cupation B score higher in B prior to the 
change than in any other occupation, includ- 
ing A. 

A special twenty-year follow-up by Strong 
(23) dealt with medical interests only but 
was reported in a more direct manner. Of 
108 Stanford alumni who were physicians 
twenty years after testing, Strong reports 
that 70 had A ratings on the Physician scale 
in their undergraduate tests and 14 received 
a rating of B+. In all, then, 78% of these 
men who made careers as doctors had had a 
“high” physician score when tested in college. 


Procedure 


The Sample. A series of 63 participants in 
the Study of Adult Development (then known 
as the e Grant Study) were given the Strong Vo- 


*From t the Study of Adult Development (the 
Grant Study), Department of Hygiene, Harvard 
University. 


cational Interest Blank by Dr. F. L. Wells in the 
academic year 1939-1940. These young men 
were part of a longer series selected for inter- 
disciplinary long-term study on the basis of their 
apparent “normality.” All were at the time 
sophomores in Harvard College. Heath (8) has 
described the original program in detail. 

This Study probably has the lowest rate of 
drop-outs of any existing longitudinal program. 
Of the 63 men given the Strong in 1939, only 
one has requested to be excused from further 
participation; all the rest are in close touch. It 
happens that the drop-out can be used in nu- 
merical summaries, since his occupation is known 
from perfectly public sources. Two men were 
lost during World War II, however. 

We have, then, 61 cases on which to test the 
predictive power of the Strong over a fourteen 
year interval, from 1939 to 1953. 

SVIB as a Predictor. How well did the Strong 
taken in college predict the occupations of these 
men fourteen years later? The basic detailed 
data for answering this question are given in 
Table 1.1. Reported in Table 1 is the current 
job-title and the name of the Strong scale re- 
garded as falling nearest to that job-title. Most 
selections are self-explanatory. One was semi- 
empirical; there being no scale for applied econo- 
mists, it turned out that Office Man often came 
nearest. Disguises occur but only in the form 
of generalizing the job-title to make it less indi- 
vidually identifiable. The last two columns of 
Table 1 are mildly subjective evaluations by the 
investigator. It seemed necessary to specify 
whether or not a scale offered a “Direct” or an 
“Indirect” measure of interest in the occupation 
entered. The indirect measures are often no 
fair test at all, yet a counselor might in practice 
be forced to make just this sort of inference 
(e.g., using the Author-Journalist scale to assess 
the advisability of teaching Drama) for the lack 
of other evidence. In the last column, an as- 
sessment of the correctness of prediction is made 
in terms of “Good Hits,” “Poor Hits,’ and 
“Clean Misses.” The definitions of these terms 
are implicit in the claims made by Strong; he 


1To reduce printing costs, Table 1 has been de- 
posited with the American Documentation Institute. 
Order Document No. 4325 from the ADI Auxiliary 
Publications Project, Photoduplication Service, Li- 
brary of Congress, Washington 25, D. C., remitting 
in advance $1.25 for 35 mm. microfilm or $1.25 for 
6X8 in. photocopies. Make checks payable to 
Chief, Photodupiication Service, Library of Congress. 
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Table 2 


Fourteen Year Validation: Strong Vocational 
Interest Blank 








Validity 


Direct Indirect 


Good Hit 22 5 
Poor Hit 7 5 
Clean Miss 14 7 


Total 43 17 


feels that a good hit may be counted wher a 
man enters an occupation for which he scored A 
or which had the Ist, 2nd, or 3rd highest rank- 
ing score on his test. Less credence is given to 
a B + score when it is outranked by many others, 
yet such scores are usually regarded as “worth 
some consideration” in counseling. They are 
here called “Poor Hits.” Anything below these 
criteria is taken to be a “Clean Miss.” 


Sixty cases could be used for validation, 
one man (No. 63) being in an occupation for 
which no scoring scale seemed even indirectly 
pertinent. It becomes apparent by inspec- 
tion of Table 2 that some accuracy is lost 
through the necessity of using indirect meas- 
ures. The fairest evaluation of the Strong’s 
predictive power may be had from the 43 
men whose occupations can be directly tested. 
Of these, only one-third are Clean Misses. 
Just half were hit well. 

These figures are slightly lower than those 
given by Strong in his follow-up of medical 
interests. There, about one out of four tests 
turned out to be complete misses. Yet one 
must remain pleased with an instrument that 
under “blind conditions” (these tests were all 
unscored until 1952) predicts future behavior 
even half the time. 


Strong’s First Proposition 


Had a counselor used these tests, in 1939, 
to suggest to the boys their likeliest future 
vocation, he would have been downright mis- 
leading only once in every three attempts. 
Yet even the “good” tests would have pre- 
sented him with a grave difficulty: the tests 
containing accurate predictions also contain 
too many “extraneous solutions.” Like a 
mathematician solving a cubic equation, the 
counselor must enter the problem with the 


expectation that not all the answers offered 
will be real and pertinent. 

Whatever its letter rating, the scale most 
pertinent to future choice of occupation 
ranked anywhere from Ist to 33rd highest out 
of the 44 scales for which each test was 
scored. The median rank of the most perti- 
nent scale was Sth. That means that the 
counselor using these tests could have ex- 
pected, on the average, four “extraneous solu- 
tions” with higher-ranking scores than the 
true solution. It is, of course, true that the 
“extraneous” quality of certain high scores is 
obvious: few would counsel a tone-deaf boy 
to be a musician. 

Strong (21) states that “a college student 
who continues ten years in the same occupa- 
tion enters an occupation in which he ranks 
second or third best.” Like our group as a 
whole, our men who continued in the same 
occupation (not considering interruption by 
the war) entered occupations in which, on the 
median, they ranked fifth best. Once again, 
our figures are slightly less impressive than 
Strong’s. It is certainly not true that among 
our cases men “continuing in occupation A 
obtain a higher interest score in A than in 
any other occupation.” 


Strong’s Second Proposition 


The proposition that men engaged in an 
occupation score higher on that occupational 


Table 3 


Testing Strong’s Second Proposition 


Average Score Average Score 
of Men of All 
Engaged in Other 
Occupation Men 


Physician B 32.8 
(N = 12) 

Lawyer 4. 30.5 
(N = 11) 

Public Administrator 
(N = §) 

Engineer 3.8 : 30.1 
(N = 4) 

Chemist 33.2 
(N = 3) : 

Minister 29.2 
(N = 2) 


Occupation 


39.6 
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scale than all other men is well supported by 
our data. That is, doctors outscore controls 
on the Physician scale, lawyers outscore con- 
trols on the Law scale, etc. (Controls are 
simply all the rest of the 61 cases.) This is 
true for every directly scaled occupation that 
occurs more than once. 

Strong’s second proposition seems to be 
valid. 


Strong’s Last Two Propositions 


Seventeen of our sixty men have made 
changes in occupation other than shifts en- 
forced by entering the armed services. Often, 
these men abandoned two or more vocations 
before settling on the job they are engaged in 
today. Strong’s follow-up data showed that 
men who abandoned an occupation were likely 
to possess lower scores on that occupational 
scale than the scores made by men who con- 
tinued on the job. 

Table 4 tests that proposition in our own 
figures. Strong found that rule to hold “ex- 


cept for the records of two individuals,” while 
we, except for one instance of tie, find it to 
be entirely so. 

Another generalization Strong offers about 
men who change vocational fields is that they 


Table 4 


Testing Strong’s Third Proposition 








Men 
Leaving 


Men 
Continuing 


Mean 
Score N 
42.3 

40.6 

45.8 

49.7 


Mean 
Score 


N 
35.5 2 
3 
4 
3 


Occupationai 
Scale 


Physician 
Lawyer 
Public Admin. 
Author-Journalist 
(Teaching) 
Engineer 
Office Man 
Production Mgr. 
Pres. Mfg. Co. 
Physicist 
Chemist 
Author-Journalist 
(Writing) 
Minister 
Salesman 
Senior C.P.A. 





33.0 
42.5 
33.0 


53.8 
44.0 
30.0 
25.3 
42.5 
45.0 
54.0 


34.0 
35.0 
21.0 
34.0 
34.0 
18.0 
32.5 


44.0 
42.0 
43.0 


35.0 
30.0 
43.0 
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will proceed from a field in which they have 
a low score into a field in which they score 
high. That was true of 9 of our changeable 
men, 7 men going contrary to their tests and 
entering new jobs for which their test scores 
were lower. (One man changed between jobs 
with identical scores.) These figures run 
faintly in the right direction, probably looking 
even less convincing than the data from which 
Strong felt that proposition 4 was “almost 
but not quite sustained.” 


Contentment in Occupation 


As Strong has pointed out (23), “the 
validity of an interest test should be meas- 
ured in terms of satisfaction” but for this 
“there is no satisfactory measure.” The 
Study of Adult Development has accumulated 
much data on expressed satisfaction and dis- 
satisfaction with occupational choice, through 
the use of annual questionnaires. Even as we 
heed the force of Murray’s (16) warning that 
one must draw conclusions from inferred 
sentiments, not from expressed sentiments, we 
may ask some operational questions about the 
relations between expressed satisfaction in 
1953 and the interest score obtained 14 years 
previously. 

The 1953 questionnaires were still coming 
in when this was written. Of the 60 men in 
whom we are interested, 37 had returned their 
questionnaires. There was, as a matter of 
fact, some tendency for the men engaged in 
occupations for which they possessed a favor- 
able Strong score to return their question- 
naires early! (Three-quarters of them had 
done so, as against half the men with lower 
scores. For this Fisher’s “p” comes out .09. 
This is not so trivial an indication as it may 
appear; the Study staff has long been aware 
that among people who are hardest to hear 
from are those who have a sense of not hav- 
ing succeeded. 

Several 1953 questions were pertinent to an 
inferrable sentiment of job satisfaction. They 
may be abbreviated as: (a) Are you con- 
templating a change in the near future? 
What considerations entered this? (b) To 
what extent has the job produced strains? 
(c) What special events have occurred in the 
last year? (d) What is your outlook on your 
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Table 5 


Job Satisfaction and Strong Score 














Appar- 
ently 
Happy 


Express 
Discon- 
tent 


Score Total 


“A” on Strong 
scale 14 3 17 
Lower scores on 
Strong scale 10 10 20 


Total 24 13 37 





personal future? And what is the principal 
basis of this? Not rarely, a participant 
makes use of the backs of the questionnaire 
pages to write us a letter in which discussions 
of job problems may be found. 

There were thirteen men, in all, who 
showed some evidence of discontent, in an- 
swer to one or another of the questions. 
These thirteen, who are “less than completely 
happy” about their jobs, include dispropor- 
tionately few who scored A on the Strong. 

Table 5 gives the figures. Fisher’s “p 
comes out less than .05. 

The question about job strains is the only 
one of those contributing to this general 
result that itself approaches significance. 
Though the figures in Table 6 are not im- 
pressive, “p’’ comes down to .08. 

The contributions of the other questions, 
though in the predicted direction, are too 
small in numbers to reach significance. (An 
example: men now occupying the lower-rated 


” 


Table 6 


Job Strains and Strong Score 


No Strains Reported 
Reported = Strains 


A or B+ 
Strong 
Rating 


Rating Total 


Lower 
Strong 
Rating 


Total 
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occupations are twice as frequently contem- 
plating a change.) 


Other Evidence 


These findings, though not so favorable to 
the test as Strong’s results, nonetheless sug- 
gest that the test has its usefulness. Further- 
more, someone familiar with the Study partici- 
pants cannot read through Table 1 without 
acquiring some feeling that, however inac- 
curate its predictions of behavior, the test 
is measuring interests. There is the evidence, 
for example, of the correlated pair of scores: 
Lawyer and Public Administrator. Some men 
enter the law because they have politics in 
mind. Cases 20, 25, and 27 are examples. 
In case 27, the Public Administrator score 
matches that for Lawyer. In case 25, the 
Lawyer score is low; the choice of lawyer 
would seem to have been contraindicated. 
That would have been correct. Case 25 
escapes being one of our dramatically un- 
happy group only because the practice of law 
is rationalized as a means to a political end. 
The Strong has measured the relative interest 
in law and politics quite accurately. Indeed, 
the suggestion of power motives given by the 
Strong is more than borne out by projective 
tests. (Case 24 is in sharp contrast. Though 
actually working for the government, this 
man is not interested in politics. That is 
what his Strong scores fourteen years ago 
predicted.) Some indication of the injustice 
of “occupations entered” as a criterion of in- 
terest may be had from case 20. In the table, 
this man is reported as a lawyer and his low- 
ish score on that scale makes him count in 
the validation as a ‘Poor Hit.” Yet he, too, 
intends to use law as a stepping-stone into 
politics, a fact that was not shown in the 
table, since circumstances have prevented his 
carrying out his plans. His score on Public 
Administrator is an A. That is also the scale 
on which he ranks first. 

One is impressed by the logic underlying 
the relative efficacy of the test in predicting 
well or poorly certain occupational choices. 
Engineers, ministers, and teachers seem to be 
highly predictable; all three are likely to 
choose their vocation in response to an inner 
“call.” By contrast, men who are in their 
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own business (which, for all three under that 
heading in Table 1, means an “externally pre- 
scribed” choice) the Strong simply does not 
predict. Another way of saying these facts 
would be to assume that the Strong tested in- 
terest and that the difference in prediction 
represented differences in the importance of 
interest as a factor in various sorts of career 
choice. The very patterning of the failures 
of the test therefore confirms its validity as 
a measure of interest! 


Private and Public School Results 


Suppose we explore the consequences of 
postulating that the Strong does measure in- 
terests. We infer that the test will predict 
future job-choices only for those men who 
(consciously or unconsciously) give weight to 
their own interests when they choose a career. 
For men who do not follow their interest, the 
test will not predict. We therefore expect 
the Strong’s “validity” to vary between 
groups known to take their own interests 
more or less seriously. A major instance of 
such a prediction is provided by our tests 
from men who prepared for Harvard at public 
and private secondary schools. 


The public school boy has usually been 
raised in the “American success culture,” de- 
scribed by many anthropologists (1, 2, 4, 10, 


15). His parents’ efforts focussed on pre- 
paring the boy for future vocational achieve- 
ment. Job choice has been for him a vital 
matter; his future self-estimate will hinge on 
his job-title and on how well he does within 
his occupational field. As one Study par- 
ticipant explained it, “I have satisfied myself 
as to my ability to compete successfully with 
most of my contemporaries.” 

The private school boy will often have been 
reared in a variant orientation, ably described 
by Florence Kluckhohn (10), where child- 
rearing was intended to perpetuate in him a 
“preferred personality.” Occupational role 
will have been subordinated to family social 
patterns. In our 1953 questionnaires eleven 
private school boys but only three public 
school boys put family interest or personal 
breadth ahead of achievement values when 
discussing their “personal future.” As Kluck- 
hohn so nicely phrased it, the contrast is be- 
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tween two subcultures, one emphasizing a 
“Doing,” the other a “Being,” orientation. 
One consequence of this subcultural con- 
trast is a difference in the importance assigned 
to interests when men make their vocational 
choice. In the “success culture” a son is ex- 
pected to surpass (therefore often bypass) his 
father’s occupation. Choosing a job is for 
him a vital matter, the more so because the 
choice is so greatly “up to him.” So much 
hinges on his making a “right” choice, cal- 
culated to yield maximal success, that he will 
often consult his own interest pattern, either 
introspectively or with formal aid from a 
vocational counselor. By contrast, the purest 
case of the upper class variant is a man whose 
permitted choices are limited to three: trustee, 
lawyer or doctor. Patricia Smith (20) de- 
scribed the sanctions that suppress other al- 
ternatives. (The Study has witnessed dra- 
matic conflicts within upper class men when 
personal “calls” gave way before the pressure 
of tradition.) While the average private 
school boy is not subjected to so focal a 
pressure, he will nevertheless possess values 
reinforcing the tangible demand that he join 
his father or uncle in The Business and the 
intangible expectation that he will first of all 
be the Right Sort. As one participant wrote, 
“As near as I can tell I have those (personal) 
qualities in some small measure, so I think 
it foolish to spend time thinking about my 
future.” ; 
If all this is true, we arrive at the predic- 
tion that interests will matter less and there- 
fore the Strong will be less valid when applied 
to the behavior of private school boys. Table 
7 shows this to be the case. Chi square sug- 
gests p less than .05; if we combine cells 


Table 7 


Validity of Strong Test Applied to Public and 
Private School Boys 








Validity Public Private 
Good Hit 19 8 


Poor Hit 4 8 
Clean Miss 8 13 


Total 31 29 


Total 
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(avoiding the low cell and isolating the rela- 
tion between public school attendance and 
“Good Hits”), we can apply Fisher’s formula 
and arrive at p below .01. Our proposition 
seems well validated. 

If we translate Table 7 into percentage, we 
discover that three-quarters of the public 
school tests gave some sort of “hit” on the 
occupation engaged in fourteen years after 
testing. That is exactly the figure reported 
by Strong (23) for his twenty-year follow-up. 
If, on the other hand, we try to apply the 
test to private school boys, our predictions 
will be useless almost half the time. 

Splitting out the public school cases, we 
can try revalidating Strong’s four proposi- 
tions. Proposition 1 fares better: men en- 
gaged in occupation A stil! do not have “a 
higher interest score in A than in any other 
occupation” but the median rank of the per- 
tinent scale is third, where formerly it was 
fifth. That is more consistent with Strong’s 
claim, quoted earlier, that the occupation con- 
tinued in will have ranked first, second, or 
third. Proposition 2 is no better for the 
public school group alone; that is because 
some occupations (engineer, chemist) attract 
high scores from public school, while others 
(lawyer, minister) attract higher scores from 
private school. At any rate, Proposition 2 
was already verified sufficiently. Proposition 
3 was already verified in every comparison, 
and so cannot be improved. ‘There is one 
scale (Public Administrator) on which Propo- 
sition 3 is false for the private school group 
but true for the public school group. Propo- 
sition 4 is about equally valid in both groups. 


Discussion 


This finding will raise various questions, 
some of which can be answered from our data. 
To forestall one, the “private school effect” 


cannot be explained in terms of income. It is 
true that the Strong is less accurate when ap- 
plied to families receiving over sixteen thou- 
sand dollars a year, but this figure marks only 
the upper quartile of our income statistics, 
while the “private school effect” is visible at 
all income levels. For example, in the second 
income quartile, with income held reasonably 
constant, between four and six thousand dol- 


lars, public school tests score good hits 75% 
of the time, private school tests only 40%. 
In all income quartiles that are adequately 
represented by public school cases, the pro- 
portion of misleading tests remains about 1 
in 4; in all income quartiles that are ade- 
quately represented by private school cases, 
the proportion of misleading tests remains 
about 1 in 2. 

These figures suggest that it is the fact of 
having attended private school (or of being 
reared in a subculture from which one is sent 
to a private school), rather than income, and 
somewhat independently of social class, that 
depressed the validity of the test. Several 
explanations suggest themselves. The most 
obvious would be that the Strong was vali- 
dated against public school graduates. (Re- 
gional differences in patterns of secondary 
education would have led to this circum- 
stance.) Next most obvious might be that 
attending private school is one of those “ex- 
periences affecting interests’ that Super (24) 
warns us have been too little studied. 


Related Findings 


The effects of private school mores on per- 
sonality reported here are not isolated phe- 
nomena. Private school boys have previously 
been assumed to possess a special system of 
values, by scientists (10, 20, 25, 26), novel- 
ists (11, 12, 17) and deans (27). Empirical 
demonstrations show that their responses dif- 
fer from those of public school boys on 
projective tests (13), especially with regard 
to the projection of need Achievement (14), 
the need that underlies the results reported 
here. What is said here of their attitudes to 
vocational success has long been known with 
regard to their attitude toward academic suc- 
cess (18) and the effect of this attitude on 
their grades has long been empirically dem- 
onstrated (5, 6, 19). Very much that is 
known about this topic remains unpublished. 

It seems to the writer that psychologists in 
Eastern universities, by failing to report the 
public-private school differences in their data, 
are failing to record a fine “natural experi- 
ment” in the laws governing culture and per- 
sonality. The New England private school 
boy is often that rarest of subjects in the 
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psychological laboratory: a member of one 
of America’s geographically scattered upper 
classes (3). The Chicago group (1, 2, 7, 9, 
25, 26) has done much to call our attention 
to differences between middle and lower class 
personalities. Are not differences between 
subcultural personalities in the middle and 
upper classes likely to be just as great? 


Summary and Conclusions 


A fourteen-year follow-up was made of 
Strong Vocational Interest Blanks adminis- 
tered in 1939 to participants in the Study of 
Adult Development. The validity of the test 
as a predictor of occupational choice at first 
appeared to be slightly lower than that re- 
ported by Strong. Of Strong’s four valida- 
tion propositions, two were confirmed, one 
(that lawyers outscore non-lawyers on the 
Law scale, etc.) strikingly, the other (that 
lawyers obtain one of their best scores on the 
Law scale, etc.) less so. The median test 


offered four “extraneous” predictions. 

It was possible to demonstrate a relation 
between conformity to choices commended by 
the test and future vocational happiness. 
Choosing a job for which one had (some years 


before) scored “A” also seemed to reduce the 
likelihood of developing fatigue, irritability or 
other symptoms of strain. 

The proposition was offered that SVIB 
validly measured interests but that failure 
to predict what job a man would choose 
could be explained in terns of his making 
the choice on some basis other than interest. 
Certain case histories supported this idea as 
did the apparent pattern in occupations which 
the test predicted accurately and which it did 
not. 

As a corollary of this proposition and on 
the basis of what has been learned elsewhere, 
it was predicted that the Strong would be 
applicable to boys who attended a_ public 
secondary school but less useful for boys 
who had prepared in a private preparatory 
institution. That was the case. The pre- 
dictive validity of the test among the public 
school group was almost exactly that origi- 
nally reported by Strong. Among private 
school boys, the test was, half of the time, 


Charles McArthur 


inapplicable. Further, Strong’s first valida- 
tion proposition was improved in the public 
school group, the median test record offering 
only two extraneous predictions. 

The import of this finding may be read in 
one of two ways. If we assume the an- 
thropological theories about the American 
middle and upper classes to be true, then this 
is a demonstration that “invalidity” in the 
Strong arises because interests do not de- 
termine choice rather than from failure of 
the test to measure interests. On the other 
hand, the implication that there may be a 
distinct psychology of the upper class is also 
pointed out. 

From all this may be drawn the follow- 
ing conclusions: 


1. The Strong has at least the validity 
claimed for it as a measure of interests. 

2. Its most rigorous validation criterion 
will be the prediction of actual behavior, but 
even that criterion is met at least 1 time in 2. 

3. We may regard as critical for under- 
standing the use of the test Strong’s (23) 
proposed “future calculations as to how much 
other factors, such as economic conditions, 
family pressufes, etc. affect a man’s occupa- 
tional career.” In this respect attention 
should be called to upper class variants of 
the American personality. 

Further study of: (a) the effects of en- 
vironmental press in conflict with interests 
measured by the Strong; and (b) the differ- 
ences between public and private school per- 
sonalities will be made from Study of Adult 
Development data. 


Received September 21, 1953. 
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Vocational Interests of Naval Aviation Cadets * 


Nathan Rosenberg and Carroll E. Izard 


The Tulane University 


In interviews with cadets who voluntarily 
withdraw from the Naval Air Training Pro- 
gram, an active dislike of flying was one of 
the most important expressed reasons for 
withdrawal (1). An attempt was made, 
therefore, to investigate the importance of in- 
terests as a correlate of success in Naval 
Aviation. This attempt was directed toward 
an examination of broad interest patterns of 
cadets through measurement of their voca- 
tional interests rather than dealing with 
specific interests in flying and the training 
program itself. Since questions about flying 
and the program are avoided in tests of voca- 
tional interests, such measures were consid- 
ered more subtle, less subject to momentary 
fluctuations in attitudes that seem present in 
newly arrived cadets, and of greater psycho- 
logical importance. 

The Kuder Preference Record, Vocational, 
Form B, (3) was chosen to measure voca- 
tional interests since it is one of the interest 
questionnaires which has been most widely 
studied for validity. Form B was selected 
because it had been administered in World 
War II to a population of Air Force cadets. 
The writers feel that Navy and Air Force 
attrition samples differ in many important 
characteristics, and the proposed comparison 
will present definitive evidence with regard to 
measured vocational interests. Generaliza- 
tions from World War II Air Force data are 
often made concerning the importance of 
many psychological characteristics for selec- 
tion of pilots. If this Air Force population 
differs in important respects from other avia- 
tion populations, such generalizations should 
be tempered. 

At the outset certain methodological con- 
siderations should be noted. It is reasonable 

*This article was presented as a report to the 
U. S. Naval School of Aviation Medicine, Pensacola, 
Florida, under ONR Project NR154-098. Opinions 
or conclusions contained in this report are those of 
the authors. They are not to be construed as neces- 


sarily reflecting the views or possessing the endorse- 
ment of the Navy Department. 
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to assume that certain vocational interest pat- 
terns may cause cadets to enter Naval Air 
Training. Once this pre-selection has oper- 
ated, there may or may not be a relationship 
between interests and successful completion of 
training. That is, interests may cause entry 
into training but they may or may not be 
predictive of success after pre-selection has 
occurred. Thus, it is important to consider 
whether naval aviators possess distinguishing 
interests prior to entry into training. 

Should selective drop-out during training 
occur, it is possible that interests operate as 
a post-selective device. This implies a cor- 
relation between interests and successful com- 
pletion of training. This correlation is most 
adequately tested by a longitudinal approach 
in which entering cadets are tested and then 
followed through the program to identify 
successful and non-successful cases. A com- 
promise to this longitudinal study is afforded 
by the cross-sectional approach. Entering 
cadets, non-successful cadets, and successful 
cadets are tested and their mean interest 
scores compared. Mean scores which differ 
systematically are interpreted as evidence for 
a correlation between interests and successful 
completion of Naval Air Training. 

Should training, or factors operating dur- 
ing training, change the interests of entering 
cadets, the change might contaminate in- 
ferences regarding test validity. For example, 
maturation of cadet interests over an 18 
month training period might well influence 
apparent test validity. In this report, selec- 
tive drop-out during training is assumed to 
result from differences in interests between 
an attrition and successful group. The pre- 
ceding considerations have been made so that 
appropriate safeguards will be followed in 
interpreting the results. 

The following questions are considered in 
this report: 


1. Do the vocational interests of entering 
Naval Aviation Cadets differ significantly 
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Table 1 


Means and Standard Deviations for Kuder Interest Scores on Various Groups Considered 





Entering Naval 
Aviation 
Cadets 
N=651 


“Successful” 
Cadets 
N= 137 

Kuder Interest ae 

Area M 


sD M 


SD 


Kuder’s 
Normative 
Populacion 

N = 2667 


(DOR) Voluntary World WarII 
withdrawals Air Force 
from Training Cadets 
N= 137 N = 937 
SD 


M SD M SD M 





178 858 
11.1 


33.2 
14.8 68.4 
18.8 73.9 
13.4 53.1 14.4 
13.6 35.5 11.1 
17.2 8.6 
65.7 16.3 
42.7 12.7 


81.5 
33.9 
70.7 
71.7 
50.9 
38.6 


16.3 
11.3 
14.5 
19.9 


Mechanical 
Computational 
Scientific 
Persuasive 
Artistic 
Literary 
Musical 

Social Service 
Clerical 


19.6 9.5 
17.0 
11.6 


66.4 
41.0 


from an unselected vocational group, namely 
the norm group found in the test manual for 
the Kuder Preference Record? 

2. Do the vocational interests of success- 
ful Naval Aviation Cadets differ significantly 
from an attrition population of cadets who 
withdraw at their own request? 

3. Do the vocational interests of present- 
day Naval Aviation Cadets differ significantly 
from a wartime Air Force population of 
cadets? 


Procedure 


Samples Used. 1. Entering classes 3—53 through 
10-53 and classes 16-53 through 23-53 were 
tested. A total of 16 classes consisting of 651 
subjects were included in this group. By the 
completion of Naval Air Training, about 15 per 
cent of an entering class will have withdrawn 
voluntarily. Attrition from all other causes gen- 
erally averages to about this same percentage; 
thus total attrition averages about 30 per cent. 

2. The successful group consisted of 137 cadets 
who were tested at Corry Field, approximately 
nine months after entry into training. Based 
upon previous experience, it is estimated that 
over 90 per cent of these subjects will graduate. 

3. A total of 137 DOR cases (Dropped at 
Own Request) were tested, as many as adminis- 
tratively possible during the period from about 
1 January through 1 June 1953. 

4. The norm group consisted of 2,667 adult 
men engaged in diversified occupations, obtained 
from the manual for the Kuder Preference 
Record. 

5. From published Air Force data, results were 
available for 937 wartime cadets, 721 of whom 


73.3 20.9 
33.3. 12.6 
61.1 16.4 
82.3 20.4 
50.4 16.0 
41.1 144 
19.7 9.5 
69.3 16.5 
4.9 ~~ «(14.1 


22.8 
10.6 
15.5 
20.6 
46.1 13.6 
47.8 15.1 
16.6 96 
van | Gee 
52.1 13.5 


78.6 
35.3 
64.0 
74.4 


86.0 15.6 
33.2 9.3 
67.6 12.6 
68.4 16.8 
493 13.3 
464 13.3 
19.0 9.0 
63.7 143 
464 12.1 


graduated primary training and 216 of whom 
were eliminated (2). 


Results 


Table 1 presents a summary of means and 
standard deviations for the nine interest areas 
measured on the groups considered. Table 2 
shows critical ratios testing the significance 
of the differences in mean interest scores for 
the groups compared. 

Comparison of Entering Cadets’ Interests 
to Kuder’s Norm:Group. The norm group 
consists of “2,667 adult men engaged in oc- 
cupations, with each major occupational 
group weighted in proportion to its occur- 
rence in the general population (with the 
exception of unskilled and semi-skilled work- 
ers)” (3). 

On the average (Tables 1 and 2), entering 
cadets possess significantly different interests 
from those found for Kuder’s norm group in 
all nine interest areas measured. Entering 
cadets are relatively more interested in scien- 
tific, artistic, musical, and mechanical ac- 
tivities and relatively less interested in cleri- 
cal, literary, social service, persuasive, and 
computational activities. 

Another method of evaluating the differ- 
ence in interests between the two groups is 
gauged by the following procedure. The 
mean interest scores for entering cadets and 
the norm group are placed on the distribution 
of scores for the norm and percentile ranks 
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Table 2_ 


Critical Ratios Testing Significance of Difference in Mean Kuder Interest Scores ¢ 








Mec Com Sci Per Art Lit 


A. Entering Cadets versus Norm Group 
241” 10.22** 3.20** 8.08** 14.97** Pa 


B. Successful Cadets versus Voluntary Withdrawals (DOR) 
0.06 3.92** 3.46** 1.43 3.62** 2.32* 


C. Entering Cadets versus World War II Air Force Cadets 
1.38 4.32** 3.61** 237° 11.29" 


Mus SS 





3.42"* 9.76** 


CR. §.52°° 1.82 


CLR. $.27° 


1.34 3.28** 





* Significant at the 5% level of confidence. 
** Significant at the 1% level of confidence. 


t Mec—Mechanical; Com—Computational ; Sci—Scientific ; Per—Persuasive ; Art—Artistic; Lit—Literary ; 
I 


Mus—Musical; SS—Social Service; Cle—Clerical. 


obtained. Percentile ranks obtained by this 
procedure are presented in Table 3. 

In a perfectly normal distribution, the 
mean interest scores for the norm group 
would all lie at percentile rank of 50, the 
median score. Deviations from a percentile 
rank of 50 for the norm group suggest the 
direction and degree of skewness for the norm 
distribution. Since all percentile ranks for 
the norm group appear fairly close to 50, 
the skewness, if significant, would not ap- 
pear pronounced. Inspection of the norm 
distribution for mechanical interest, where 
the mean score approximates a percentile 
rank of 45, suggests that mechanical inter- 
est scores are slightly skewed toward the high 
end of the distribution. This explains an ap- 


Table 3 


Percentile Ranks of Interests for Entering 
Naval Cadets and Norm Group 








Difference 
Between 
Entering 

and Norm 

Group 


+ § 
-— 4 
+17 
— 4 
+11 


Entering 
Naval 


Cadets 


Interest 
Area 


Norm 
Group 
Mechanical 50 45 
Computational 45 49 
Scientific 67 50 
Persuasive 48 52 
Artistic 65 54 
Literary 30 

Musical 65 

Social Service 

Clerical 20 





parent contradiction whereby entering cadets 
show a mean mechanical interest score equiv- 
alent to a percentile rank of 50 on the norm 
distribution and, at the same time, show sig- 
nificantly greater interest in the mechanical 
area than the norm. 

The extremity of the differences between 
entering cadets and the norm group is em- 
phasized for the clerical, literary, and social 
service areas. Entering cadets seem pre- 
selected with respect to a relative dislike for 
activities of reading or writing (literary), 
routine filing or secretarial work (clerical), 
and activities which contribute to the welfare 
of people (social service). To a lesser ex- 
tent, they are pre-selected with respect to a 
relative liking for activities of the scientific, 
artistic, and musical interest areas. 

Since the norm group is presumably older 
than the cadet group, it may not be concluded 
that these differences are all characteristic of 
Naval Cadets as a vocational group. Some 
of the differences could reflect changes in in- 
terest characteristic of an older age group. 
Furthermore, cadets undoubtedly represent a 
population with more education than do the 
norm group. Thus some of the differences in 
interests could be a reflection of educational 
level which distinguishes the two groups, 
aside from vocational selection. When these 
factors are better controlled, it will be possi- 
ble to isolate which of the interest areas re- 
flect those characteristics of a vocational 
group and not those for age or educational 
groupings. 
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It would seem reasonable that a preference 
for the scientific area would be the one area 
most likely to be truly characteristic of Naval 
Aviators as opposed to vocationally unse- 
lected groups. 

Comparison of Successful Cadets to Volun- 
tary Withdrawals (DOR). Differences in in- 
ierests between the above two groups sug- 
gest the possible usefulness of the Kuder 
Preference Record as a predictor of DOR 
attrition. As can be noted from Tables 1 and 
2, successful cadets are significantly more in- 
terested in mechanical and scientific activi- 
ties than DOR’s. They are significantly less 
interested in persuasive, literary, and musical 
interests than DOR’s. 

From these results, the interest picture for 
the successful cadet is an individual who has 
a positive attraction toward activities which 
involve the use of tools and machinery; he 
also likes abstract and theoretical activities of 
a scientific nature. The DOR appears to be 
an individual who is more interested in ac- 
tivities which involve convincing people (per- 
suasive), reading or writing, and apprecia- 
tion or participation in musical activities; he 
is less attracted by mechanical and scientific 
activities. 

In this connection, it should be recalled 
that entering Naval Aviation Cadets are se- 
lected with respect to mechanical aptitude 
since cadets with very low Mechanical Com- 
prehension Test scores are not admitted to 
the training program. These data indicate 
that mechanical interest, aside from mechani- 
cal aptitude, is important for successful com- 
pletion of the Naval Air Training Program. 
Further study will be made to evaluate the 
improved prediction of DOR attrition when 
aptitudes and interests are both considered. 

Comparison of Entering Cadets’ Interests 
with an Air Force Population. Critical ratios 
(Table 2) reveal some important differences 
in interest between the above two groups. 
Entering cadets’ interests differ significantly 
from the Air Force entering cadet population 
in all areas with the exception of computa- 
tional and musical activities. The differ- 
ences between the two groups are particu- 
larly pronounced for the literary, clerical and 
mechanical areas. 


Inspection of the mean scores for the Air 
Force eliminees from training reveals that 
differences in interests between the two at- 
trition groups are considerable. The Naval 
Cadet who withdraws voluntarily shows es- 
sentially a different interest pattern from the 
Air Force cadet who was eliminated from 
training during World War II. The reasons 
for this difference are not very clear, aside 
from motivation present during World War 
II which is not so pronounced today. How- 
ever, the important fact is that these two 
populations are different—at least with re- 
spect to interests. Thus if a test did not 
show validity on the Air Force population of 
World War II, this does not necessarily pre- 
clude its being valid for present day Naval 
Aviators. The attempt to use the Kuder in 


this study was undertaken despite Air Force 
data which showed it to be invalid for pre- 
dicting pass-fail during World War II (2). 


Discussion 


It will be recalled that successful cadets 
as compared with DOR groups possess higher 
mean interest scores for mechanical and scien- 
tific areas and lower for the persuasive, liter- 
ary, and musical areas. The mean interest 
scores for entering Naval Aviation Cadets lie 
between those found for successful and DOR 
groups for mechanical, literary, and musical 
interest areas (Table 1). These findings for 
the entering group are consistent with the 
assumption that selective drop-out from train- 
ing caused the significant differences noted 
between successful and DOR groups. How- 
ever, the scientific interest area deserves spe- 
cial comment since the mean scientific inter- 
est score for the successful group is 68.4, for 
the DOR group 61.1, but for entering cadets 
70.7. Although entering cadets are more like 
the successful than the DOR, for a definite 
trend to be present, the mean interest scores 


1 Mean interest scores for Air Force eliminees from 
training differ by only a small fraction of a point 
from those for the Air Force graduates, with the 
exception of artistic and social service interests. 
Eliminees are about 2.0 and 2.5 points higher and 
lower in these two interest areas respectively. Thus, 
interest comparisons may be made directly to the 
total Air Force population means with little loss of 
accuracy as compared to the eliminees from this 
population. 
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for entering cadets should lie between the 
means for successful and DOR groups. The 
same reasoning applies for persuasive interest 
where the mean score for the entering group 
does not lie between the DOR and successful 
groups. 

It is possible that entering cadets tend to 
over-rate their interest in scientific activities. 
Having just reported to Naval Air Training, 
it is conceivable that they would tend to rate 
themselves higher in scientific interest merely 
because they feel they should be high in this 
interest. 

Since further “cross-validation” will be ap- 
plied to these data in any case, an empirical 
check will be made for those interest areas 
apparently important for successful comple- 
tion of Naval Air Training. Based on the 
differences between successful and attrition 
cases, weights will be given to the interests 
that distinguish the two groups. From these 
weights, predictions of pass or DOR attrition 
will be made for entering cadets. In time, 
cadets who actually voluntarily withdraw and 
those who succeed will be determined. These 
results will be checked against the predictions 
made, and the actual utility of the Kuder 


Preference Record for predicting DOR cases 


will be ascertained. From the results pre- 
sented in this report, it seems very likely that 
the measured vocational interests of entering 
cadets will predict DOR attrition significantly 
greater than chance expectation. 


Summary 


The vocational interests of cadets would 
seem important for successful completion of 
Naval Air Training. Therefore, the Kuder 
Preference Record, a measure of relative pref- 
erence for nine broad vocational interest areas, 
was administered to 651 entering Naval Avia- 
tion Cadets, 137 DOR attrition cases (volun- 
tary withdrawals from training) and 137 
“successful” cadets. ‘The successful cadets 
were tested near completion of their basic 
training; from previous experience it is esti- 
mated that over 90 per cent of these cadets 
will graduate. 

Results indicate: 
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1. Entering cadets show significantly more 
interest in scientific, artistic, musical, and 
mechanical activities than a vocationally un- 
selected population. They are less inter- 
ested in clerical, literary, social service, per- 
suasive, and computational activities. 

2. Successful cadets are relatively more 
interested in mechanical and scientific activi- 
ties as compared to a group who withdraw 
from training at their own request. They are 
less interested in persuasive, literary, and 
musical activities than the voluntary with- 
drawal cases. 

3. The voluntary withdrawal group shows 
an essentially different interest pattern than 
the group eliminated from training in the Air 
Force during World War II. 

It is concluded: 


1. Entering cadets have interest patterns 
which are different from those found for a 
vocationally unselected group. Some of these 
distinguishing interests may arise because of 
cadets’ age or educational level rather than 
choice of Naval Aviation as a vocation. The 
factor of selection screening, as well as self- 
selection on the basis of interests, may have 
partially determined these interest patterns. 

2. The Kuder Preference Record shows 
promise of validity for predicting DOR at- 
trition. The mechanical, scientific, persua- 
sive, literary, and musical interest keys ap- 
pear the most important for this purpose. 

3. Some psychological tests which failed 
to predict attrition for World War II Air 
Force cadets may show validity for present 
day Naval Aviators. 
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Coding the Kuder Preference Record—Vocational * 
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Underlying the use of vocational interest 
tests in vocational planning is the assump- 
tion that if a person’s interests are similar to 
the interest of people in occupational groups 
who have experienced a high degree of satis- 
faction in their work, he will derive most 
satisfaction doing the same or similar kind 
of work. That is, if a person’s interests in 
“common-everyday things” are most similar 
to, say, engineers, there is a high probability 
that he will derive more satisfaction from 
work as an engineer or some closely related 
occupation than he would from other occupa- 
tions. There has been considerable research 
to substantiate this proposition. In order, 
then, for a counselor to be effective in the 
interpretation of his client’s interests as meas- 
ured by tests, he needs some sort of guide to 
aid him in giving the client a comparison of 
his interests with that of various occupational 
groups. 

This paper presents a guide for the coun- 
selor to use in interpreting the individual 
profile of the Kuder Preference Record— 
Vocational (Kuder PR-V). In order to facili- 
tate a meaningful interpretation of the test 
data to the client, it is often better for the 
counselor to speak in terms of several oc- 
cupational fields in addition to descriptive 
terms (such as mechanical, artistic, persua- 
sive), the meaning of which is often vague 
to the client. There is, therefore, a need for 
a grouping of occupations based on real test 
data which the counselor may feel confident 
in using. 

Kuder (12) grouped specific occupations 
under the various scale headings of his in- 
ventory. However, many counselors have 
been reluctant to interpret the client’s profile 
in terms of Kuder’s groupings of occupations 
because many of the groupings were not sup- 


*A revision of University of Missouri Counseling 
Buceau Research Report, No. 9a (Mimeographed), 
1952. This report which included Tables 1 through 
4 of the present paper is available from the authors 
at a cost of 50 cents per copy. 


ported by empirical data. During the past 
few years Kuder and many others have re- 
ported a considerable amount of empirical 
data about various occupational groups which 
can be used to group occupations according 
to interest test profiles. However, some of 
the discrepancies between Kuder’s grouping 
(12, Table 1) and his empirical data (12, 
Tables 2 and 3) are rather striking. Wiener 
(20) cited as an example of one of these dis- 
crepancies Kuder’s “39” listing (Scientific 
and Clerical interests) as including the oc- 
cupation of pharmacist. Looking at actual 
test results for a group of “pharmacists and 
drug store managers,” however, one sees a 
significant elevation on scale 3 (Scientific) 
and only an average score on scale 9 (Cleri- 
cal). 

We find, as another example, that Kuder 
lists “Author; editor; reporter” under the 
categories of “4” (Persuasive), “6” (Liter- 
ary), “36” (Scientific-Literary), “46” (Per- 
suasive-Literary), “67” (Literary-Musical) 
and “68” (Literary-Social Service). From 
empirical research, Mathewson and Herbert 
(14) found that only the “67” category was 
the pattern for their group of 113 author- 
journalists. 

Also, one is not justified in saying, as Kuder 
(12) implies, that a person should score high 
on the mechanical scale in order seriously to 
consider engineering as a career. Chemical, 
civil, electrical, and sales engineers as groups 
do not have mean scores on the mechanical 
scale above the 65th percentile rank. Me- 
chanical engineers and some industrial engi- 
neers did score significantly high on the aver- 
age on the mechanical scale. The mean score 
of all professional engineers on this scale is 
below 65 P.R. (12, Table 2). Actually the 
interest typical of the large majority of engi- 
neers is characterized by significant elevations 
on scales 2 and 3 (Computational and Sci- 
entific). 

Other discrepancies are apparent after com- 
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paring Kuder’s groupings with empirical re- 
sults. Thus, many of Kuder’s original “logi- 
cal” groupings now can be replaced by the 
increased body of empirical data. As a re- 
sult a counselor can operate more effectively 
when he can base his test interpretation on 
real data. 

In order to make this information based on 
actual test data usefully available to the 
counselor it is necessary to have it organized 
into some system. Wiener (20) proposed a 
coding system which coded each individual 
score over the 75th percentile rank. How- 
ever, Frandsen (8) criticized Wiener’s system 
as not being comprehensive enough; that is, 
the 75th percentile rank was too rigorous a 
cutting point. Instead of using only the 
scores above the 75th percentile rank, Frand- 
sen suggested a coding system that would in- 
clude deviations outside the 65th to 35th 
percentile rank range, and thus gain much 
better differentiation among the various oc- 
cupations. By such a system, there would 
be less frequency of finding that the code 
for an individual’s profile matches identically 
the codes of many different occupational 
groups. Also, a mean score which falls out- 
side the 65th to 35th percentile range would 
be a significant deviation for almost any rea- 
sonably sized group. 

Diamond (5) has shown how the use of a 
uniform cutting score for the Kuder scales is 
misleading and not in keeping with the reality 
of the occupational world as revealed in the 
census data. He notes that “not 25 per cent 
of employed urban men, but approximately 
40 per cent, are engaged in occupations of a 
mechanical nature. It is, therefore, a sta- 
tistical absurdity to expect that all men who 
enter the mechanical field shall have mechani- 
cal interest above the 75th percentile rank.” 
On the other hand, there are some interest 
fields which employ only a fraction of one per 
cent of the labor force. Music is such a field. 
In this connection, Diamond points out that 
a musician who scores at the 75th percentile 
rank on the Kuder PR-V musical scale is 
more than two standard deviations below the 
mean of his occupational group. 
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So far, relatively little attention has been 
given to significantly low scores on interest 
test scales. The low scores may be equally 
useful in characterizing the interest of an 
occupational group as are high scores. For 
example, most engineers score significantly 
low on the social service scale. 

It appears, therefore, that a system for 
coding Kuder PR-V profiles which would re- 
flect the low as well as the high scores, would 
be quite helpful in studying the kind of in- 
terest which is typical of various occupational 
groups. Such a system should provide a code 
for a profile which is short and simple but 
which preserves a maximum amount of in- 
formation. 

A coding system is proposed here which 
meets Frandsen’s (8) objections to Wiener’s 
(20) system. It is similar to the system re- 
ported by Hathaway (10) and Holland et al. 
(11). 


Coding Procedure 


To code a profile, follow these steps. As an 
example we will use the percentile ranks cor- 
responding to mean raw scores on the vari- 
ous scales of the Kuder PR-V made by a 
group of surgeons (12). The first number 
denotes the scale and the number after the 
dash denotes the percentile rank: 


Mec 
1-45, 


Out 
0-68, 


Com Sci Per 
2-25, 3-75, 4-27, 


Mus Soc Cle 
7-62, 848, 9-25 


Art Lit 
5-61, 6-66, 


Step 1. Select all scores of 75 P.R. and 
above and list the scale number in descending 
order of magnitude of the percentile ranks. 
Then place an apostrophe after these. Ex- 
ample: 3’. 

Step 2. Select all scores between 74 P.R. 
and 65 P.R. inclusive and list the scale num- 
bers in descending order of magnitude of the 
percentile ranks next after the apostrophe. 
Then place a dash after these. Example: 
3’06—. 

Step 3. Select all scores 25 P.R. and be- 
low and list the scale numbers in ascending 
order of magnitude of the percentile ranks. 
Then place an apostrophe after these. Ex- 
ample: 3’06—29’. 
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0-26.10 
(D.0.T.) 


Surgeon 
(job title) 
Kuder PR-V, Form C, (P.R.): 
0-68, 1-45, 2-25, 3-75, 4-27, 





February 1953. 


Notes: (description of the group, evaluation of the data, etc.) 
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Step 4. Select all scores 26 P.R. to 35 
P.R. inclusive and list the scale numbers in 
ascending order of magnitude of the per- 
centile ranks next after the apostrophe. Ex- 
ample: 3’06—29’4. 

Step 5. Place the V-Score in parentheses 
after the entries so far. This applies pri- 
marily to the coding of profiles of individuals. 
Example: 3’06—29’4 ( ). 

It is proposed that any serious user of the 
the Kuder PR-V prepare codes for all occu- 
pational groups for which profile data are 
available, such as those reported in the 
manual (12, Tables 2 and 3) and in various 
journal articles. Then a duplicate set of 
cards should be prepared for each occupa- 
tional group. (See example.) One set of cards 
should be filed numerically according to code 
number and the other set alphabetically ac- 
cording to job title. Data for men and 
women should either be filed separately or 
on different colored cards. 

The Dictionary of Occupational Titles 
(D.0.T.) code number is for cross reference 
to that system of classifying occupations. 

Once these two files are prepared, any 
given profile can be coded and referred to the 
code file for a list of job titles which have 
similar codes. Also, the job title file can be 
searched to determine if the codes of the 
occupations being considered by the client 
agree reasonably well with the code of the 
client’s profile. As new data become avail- 
able, appropriate cards can be prepared and 
inserted in the card files. 

Use of such a coding system facilitates 


5-61, 
Reference: Kuder, F. G., Examiner Manual for the Kuder Preference Record 


—Vocational, Form C. Chicago, Ill. : Science Research Associates. 
Table 2. 





52 3’06-29'4 
(N) (Code) 


6-66, 7-62, 8-48, 9-25, V- 





Example of card showing codes. 


more valid use of the Kuder PR-V by bring- 
ing real data to bear on the interpretation of 
a profile rather than basing interpretation on 
“logical” guesses which have proved fallible 
in the past. It may be desirable to extend 
the code file to include individual cases so 
that the counselor may then refer to his own 
case records of individuals as well as occupa- 
tional groups for aid in interpretation. 

From the various sources of research, four 


tables of data have been compiled. Table 


1 (M-code) lists the various male occupa- 


tional groups according to the numerical 
value of their codes. The number of sub- 
jects in each group and the reference to the 
original data are also given. ‘Table 2 (M- 
alphabet) lists the various male occupational 
groups alphabetically by job title. Table 3 
(F-code) lists the various female occupa- 
tional groups according to the numerical code 
value. Table 4 (F-alphabet) lists the vari- 
ous female occupational groups alphabeti- 
cally by job titles." 

It must be remembered that knowledge 
that an individual has interest similar to a 
particular occupational group does not insure 
that he will be successful or even satisfied in 
that occupation. The power of Kuder PR-V 
to predict job success or satisfaction is largely 

1 Tables 1 through 4 have been deposited with the 
American Documentation Institute. Order Document 
No. 4322 from the ADI Auxiliary Publications Proj- 
ect, Photoduplication Service, Library of Congress, 
Washington 25, D. C., remitting in advance $1.75 
for 35 mm. microfilm or $2.50 for 6 X 8 in. photo- 


copies. Make checks payable to Chief, Photodupli- 
cation Service, Library of Congress. 
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unknown. However, use of a system such as 
proposed here will bring us one step closer to 
prediction of job satisfaction and possibly to 
a lesser degree, job success. 

There is a limitation in the use of codes of 
interest profiles when based on mean scores 
which should be borne in mind. If the inter- 
est of an occupational group is highly homo- 
geneous, a single code will reflect this interest 
pattern quite accurately. However, if the in- 
terest of an occupational group is quite 
heterogeneous, a single code will not reflect 
the interest of that group accurately. Several 
codes, each based on a homogeneous sub- 
group, may be required to reflect accurately 
the interest of an occupational group. 

An example of an occupational group which 
has heterogeneous interests is ‘secondary 
school teachers.” The code for “all male 
secondary school teachers” and that of several 
of the sub-groups can be contrasted as fol- 
lows: 


all secondary school teachers 
(male) 

commercial teachers (male) 

mathematics teachers (male) 

social studies teachers 
(male) 

music teachers (male) 

vocational training teachers 
(male) 


’8—'14 
9’2—1°3 
23’—4'9 


8’6—1’5 
7’6—123’ 


"15—4'7 


An example of an occupational group which 
appears to have homogeneous interests is 


“nurses.” The codes for “all trained nurses” 
and several sub-groups are as follows: 


8’'3—'94 
8’3—9"4 
"83—’94 
8’3—'94 
8’—9’ 


all trained nurses (female) 
nurse educators (female) 
general staff nurses (female) 
private duty nurses (female) 
public health nurses (female) 
supervisors and head nurses 


(female) "8—'94 


Researchers are urged to investigate and 
report on the “homogeneity of interest” when 
reporting on the interest of any occupational 
group. This may be accomplished by re- 
porting the frequency of various codes which 
members of the group achieve. However, the 


establishment of a code frequency distribu- 
tion for an occupational group is often a dif- 
ficult task. It can be done in several ways, 
the first of which might well be coding the 
mean scores. A second way might be a 
tabulation of how many persons in a group 
had scores on the various scales coded in 
different parts of the code; ie., high (75 
P.R. and above), near high (65-74 P.R.), low 
(25 P.R. and below), near low (26-35 P.R.), 
and not coded (36-64 P.R.). Table 5 is 
such a tabulation for 62 students in a nurs- 
ing education program. It can be seen from 
Table 5 that 54 of the 62 student nurses 
scored 75 P.R. or above on scale 8 (social 
service) and none of them scored as low as 
the 35 P.R. Ninety-five per cent of this 
group scored 65 P.R. or above on scale 8. 
Thus we see that a reasonably high score on 
scale 8 is typical for almost all student nurses 
in this group. By similar analysis we can 
find other characteristics of our group, such 
as a low score on scale 9 (clerical). 

The actual frequency with which any par- 
ticular code occurs in a group is probably the 
most precise way in which to describe the 
interest of the group. However, this method 
does not lend itself well to the making of 
summary statements about a group. Of the 
62 student nurses mentioned above, 18 of 
them achieved codes which contained an 83— 
94 code. That is, they may have had other 
scales coded high or low but scales 8 and 3 
were coded high and scales 9 and 4 were 
coded low. Forty-one of the 62 student 
nurses had scales 8 and 3 coded high without 
regard for how other scales were coded. It 
required ten different codes or code varia- 
tions to account for all cases in this group. 
However, eight of these ten codes were 
merely variations of the 83-—94 code. 


Summary 


A proposal has been made whereby inter- 
pretation of Kuder PR-V interest profiles can 
be made more valid by bringing to bear upon 
the interpretation the fast-growing body of 
empirical data relative to typical profiles for 
various occupational groups. A system for 
coding profiles has been described and some 
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Table 5 


Frequency of Scores Appearing in the Various Parts of the Code for 62 Students 
in a Nursing Education Program 





Position in the Code 


High 


Near Near Not 
High Low Low Coded 





Outdoor 
Mechanical 
Computational 
Scientific 
Persuasive 
Artistic 
Literary 
Musical 

Social Service 
Clerical 


ways of using codes in interpreting profiles 
have been presented. Finally, the effect of 
heterogeneity of interest within a group upon 
the use of a single code to describe the inter- 
est of that group was discussed as a limitation 
and a caution in the use of codes. 
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Transfer of Training in Tracking as a Function of Control 
Friction * 


F. A. Muckler and W. G. Matheny 


University of Illinois 


The degree to which a training device 
should simulate the final task is of consider- 
able practical interest to those concerned with 
training as well as to the manufacturers of 
training devices. A training device may simu- 
late a psychomotor task to a greater or lesser 
degree along several dimensions. One of 
these dimensions is the control force neces- 
sary for accomplishing the task. The ques- 
tion becomes: what degree of fidelity of simu- 
lation of control force is necessary in the 
training device in order to secure optimum 
transfer of training? 

The present study was designed to investi- 
gate the effect of varying control friction 
upon transfer of training in a visually guided 
tracking task. 

Despite a considerable literature, the ex- 
perimental evidence on the effect of friction 
in control mechanisms is not clear cut. In 
general, friction has been found to be un- 
desirable, but the effect is a function of such 
variables as: (a) the type of friction involved 
(4, 7, 9, 13); (b) the tracking task (6, 8); 
(c) the presence or absence of inertia (7, 11); 
(d) the radii when handwheels or knobs are 
used (13, 14); and (e) the response measure 
recorded (6, 10, 15). Further, the effect of 
friction may be specific to complex interac- 
tions of many of these variables (7, 14). 

All of these studies are concerned with 
either original learning or performance situa- 
tions while the question of transfer from one 
control friction to a different control friction 
remains relatively uninvestigated. In a study 
summarized by Craik (2) and reported by 
Vince (16), subjects were trained to make 


1This research was supported in part by the 
United States Air Force under Contract AF 33(038)- 
25726, monitored by the Air Force Personnel and 
Training Research Center. Permission is granted for 
reproduction, translation, publication, use and dis- 
posal in whole or in part by or for the United 
States Government. We should like to thank Dr. 
L. H. Lanier, Dr. A. C. Williams, and Dr. W. E. 
Kappauf for their valuable suggestions and criticisms. 
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corrections with a lever operated against a 
stiff spring. After the subjects were making 
accurate movements, the spring tensions were 
changed. The new response was found to be 
delayed by at least 0.16 second; this time in- 
terval was termed the “kinesthetic reaction 
time.” More directly applicable is the ex- 
periment reported by Bilodeau (1). Two 
groups rotated a crank handle at either heavy 
or light loads for five minutes. A third group 
practiced first under a light load and second 
under a heavy load alternately for cne minute 
periods for five minutes. The fourth group 
started under a heavy load changing to a 
light load under the same procedure. Of 
interest here is the fact that when the latter 
two groups were shifted, “rate output was ap- 
proximately equal to that of non-shifting 
groups” (1, p. 100). These data are inter- 
preted here to imply that there is no specific 
effect of previous practice on either a heavy 
or light load to the performance of the task 
under the light or heavy load, respectively. 

In this experiment, the effect of changing 
friction upon the level of performance in a 
visually guided tracking task was investi- 
gated. Experimental evidence was sought for 
a change from a higher friction to a lower 
friction, from a lower friction to a higher 
friction, and from a “frictionless’’ condition 
to a friction system. 


Experimental Method 


Experimental Task. The task was following 
pursuit tracking and required the subject to 
track a continuous sine wave drawn along a 
moving roll of paper. This line passed behind 
a horizontal slit in a viewing panel at a rate of 
four cycles per minute. A _ lever-type control 
handle, moving horizontally forward and back- 
ward, controlled a pencil-type pointer. Track- 
ing responses were recorded in the form of a 
continuous response line on the paper with the 
stimulus line. 

The friction in the system could be varied sys- 
tematically by means of a brake drum attached 
to the control lever. The friction was independ- 
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ent of both rate and extent of control movement. 
Further, there was no “centering” tendency of 
the control lever. 

Procedure. Each subject was given the fol- 
lowing instructions: 

This instrument is called a tracking device. 
In this opening (point) there will appear a mov- 
ing line, which will go back and forth across the 
opening. This (point) is the control handle. 
As you move the handle forward, this pointer 
(show) will move to the right; as you move the 
handle backward, the pointer will move to the 
left. Now the pointer that you control will 
make a mark on the moving paper. Your job 
will be to match the mark you make with the 
line that is presented to you. Please use only 
one hand, the hand you start with. Are there 
any questions? 

To reduce fatigue effects, the subjects were 
given a two-minute rest period after the comple- 
tion of twenty cycles. After the subject had 
reached criterion on the original learning task, 
he was sent from the room while the control 
friction was changed. The time from comple- 
tion of original learning to the beginning of the 
transfer trials was, in all cases, two minutes. 
To observe the effect of the two-minute break, 
the control group was given a two-minute rest 
and then continued the task with the same fric- 
tion load. 

Criterion. The subjects were said to have 
learned the pattern when they did not deviate 
more than two millimeters from the stimulus 
line for three successive cycles. A trial was de- 
fined as one sine wave cycle. 

Experimental design. Seven experimental groups 
were assigned: 0 (approximately 2.5 ounces), 2, 
4, 6, 8, 10, and 12 pounds. The basic design 
was the familiar paradigm cited by Woodworth 
(17) as Plan 4: 


Transfer group learns A 
Control group 


Learns B 
Learns B 


The control group selected was the six-pound 
friction group. Thus, three groups—8, 10, and 
12 pounds—transferred to the lower control 
pressure of six pounds. The groups 0, 2, and 4 
pounds transferred to the higher control pres- 
sure of six pounds. The basic design is the 
same in all cases. 

Measurement of transfer is recorded in per 
cent savings of trials. The formula used is from 
Gagne, Foster, and Crowley (5): 


Per cent transfer = 
Control group score—transfer group score 
Control group score—total possible score 





Since the response measure used was the num- 
ber of trials to criterion, the total possible score 
could be reduced to zero. 
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Subjects. A total of 105 Air Reserve Officer 
Training Corps Cadets were used. The age 
range was 17 to 25 years with a mean of 19.6 
years. The subjects were assigned at random, 
on the basis of a table of random numbers (3), 
so that each experimental group contained 15 
subjects. One restriction was placed on the 
randomization, namely, the subjects were as- 
signed in blocks of seven. 


Results 


Original Learning. The mean number of 
trials to reach criterion is shown in Table 1 
for each experimental group. Since Bartlett’s 
test for homogeneity (3) showed the vari- 
ances of these scores to be homogeneous, and 
since the distribution of trials was found to 
be “moderately” normal, an analysis of 
variance was computed. There were no sta- 
tistically significant differences between the 
experimental groups in original learning. 

However, since the distributions did show 
some skewness, confirmation of the analysis 
of variance result was sought by the use of 
a distribution-free technique described by 
Mood (12) as “simple linear regression.” 
The application of this test gave results com- 
pletely in accord with those obtained from 
the analysis of variance. 

Transfer of Training. The mean number 
of transfer trials necessary to reach criterion 
for every experimental group is shown in 
Table 1. Per cent transfer of training was 
computed on the basis of the formula men- 
tioned previously. In Figure 1, per cent 
transfer of training is shown as a function of 
control friction. Individual transfer points 
are: 0 pounds, 86 per cent; 2 pounds, 91 per 
cent; 4 pounds, 90 per cent; 6 pounds (con- 


Table 1 


Mean Number of Trials to Reach Criterion 


Transfer 
Learning 


Original 
Learning 


Experimental 
Groups 





28.6 3.8 
25.5 2.5 
27.0 2:7 
27.3 0.0 
25.6 1.4 
22.0 2.5 
24.9 2.8 
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1. Per cent transfer as a function of control 
friction 


trol), 100 per cent; 8 pounds, 93 per cent; 
10 pounds, 90.8 per cent; and 12 pounds, 
89.7 per cent. 

It will be recalled that the control group 
(6 pounds friction) was given a two-minute 
break after original learning and then con- 
tinued on the same task as may be seen in 
Table 1. There was no decrement of per- 
formance observed; the criterion level was 
maintained. This result may be interpreted 
as 100 per cent positive transfer and will be 
found, as such, in Figure 1. 

Ignoring the 6 point control group, a test 
was made of the significance of differences be- 
tween the experimental groups on raw score 
transfer scores. Since the distribution of 
transfer trials was highly skewed, the re- 
sults were evaluated by the distribution-free 
technique previously described as “simple 
linear regression.” The chi-square evaluation 
showed that the null hypothesis is accepted 
and that there were no statistically significant 
differences between the transfer groups. 


Discussion 


Original Learning. The results indicate 
plainly that performance to criterion under 
these experimental conditions was independ- 
ent of control friction with the response meas- 
ure used. Of the literature previously cited, 
both Hick and Clarke (9) and Gray and Ell- 
son (6) have obtained similar results. 


Transfer of Training. The results indicate 
that a change in friction had very little effect 
on the level of performance. The lowest 
mean transfer for an experimental group was 
86 per cent for the 0 pound group. Since 
there were no significant differences between 
experimental transfer groups, these data show 
that transfer of training in this tracking task 
was relatively independent of control friction. 

The implication of these data for training 
devices seems clear.. Where control forces 
are a variable, optimum transfer will be ob- 
tained by exact simulation; nevertheless, little 
will be lost if the control force varied. Obvi- 
ously, since this conclusion rests on the re- 
sults of a relatively simple laboratory task, 
further validation with specific training de- 
vices seems necessary. 


Summary 


The effect of transfer from several amounts 
of friction to another level of friction in a 
manual control system was _ investigated. 
Transfer effect was found to range from 86 to 
93 per cent positive transfer; it was found 
that transfer was relatively independent of 
control friction under the conditions used in 
this study. Finally, control friction had little 
apparent influence on original learning with 
the criterion measure used. 
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In a recent paper (1) on the Personal His- 
tory method, the following conclusions were 
drawn: 


1. Five isolated personality trait scores, 
from two standard inventories, were a more 
“efficacious” assessment device than individ- 
ual reports derived by the Worthington Per- 
sonal History method. “Efficacy” was not 
defined by the authors, but presumably they 
meant accuracy in predicting the job effec- 
tiveness and promotability of the industrial 
employees in the study. 

2. This (they said) “constitutes damaging 
evidence as to the usefulness of the Personal 
History.” 

3. Furthermore (they continued) “these 
results . . . tend to follow the pattern of 
dubious or negative results found in valida- 
tional studies of other projective techniques.” 

As a matter of fact, even the selected data 
included in the Clark-Owens report would 
lead an impartial investigator to exactly the 
opposite conclusion on each of these points. 

Part of the explanation appears to lie in 
the fact that several major errors were com- 
mitted in designing and executing this little 
study. Constructive corrections for these 
were recommended to Clark on January 6, 
1953, following a conference with him in De- 
cember, 1952; but the issues still appear to 
be disregarded in the recent Clark-Owens 
article. 

1. The criterion was a set of ratings by 
co-workers (not supervisors), according to 
this pattern: Judges 1 and 2, in Dept. X, 
rated subject A; judges 3 and 4, in Dept. Y, 
rated subject B; and so on. No attempt was 
made to find the comparability of ratings 
made by different judges in different depart- 


ments. Thus, the reliability of the criterion 
is an unknown quantity, with an error of un- 
known but undoubtedly considerable size. If 
it were not that these ratings proved sig- 
nificantly related to both the PH ratings and 
the standard inventories, implying some kind 
of meaningful stability in the criterion, this 
feature would invalidate the entire study. 

2. The research was ultimately narrowed 
to a few traits, apparently because only these 
traits could be measured by the inventories. 
The proper procedure, of course, to validate 
the PH reports, would be to measure those 
traits which the PH covers. (Editorial policy 
does not allow space for an illustrative PH 
report. See reference 10, for an example.) 
The task of measuring the interaction of 
traits, which the PH undertakes to do, is be- 
yond the scope of standard-inventory scores, 
of course, especially if the scores are taken 
singly. Perhaps for this reason, this latter 
issue was ignored. In short, the study was 
not really adequately designed to test the 
validity of the PH. 

3. A peculiar and persistent error in using 
the Chi-square method is explained below in 
Conclusion No. 2. 

4. Despite the fact that Clark and Owens 
(erroneously) termed the contingency co- 
efficients for both PH and inventories “not 
significant,” they proceeded to compare the 
coefficients for the two methods, though only 
on five personality traits. In doing this, they 
apparently did not realize that contingency 
coefficients from different sets of data cannot 
be compared unless a class-index correction 
is applied (2). These are not correlation co- 
efficients. Without the correction, it is im- 
possible to tell whether a C of .75 from one 
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set of data is larger, equal, or smaller than a 
C of .65, .75 or .85 from different data. This 
is a relatively minor point, but it is still an 
error. 


The Correct Conclusions from the Data 


Despite the questionable or fallacious pro- 
cedures, the actual data which Clark and 
Owens report clearly show the following facts: 

1. The Personal History reports were trans- 
lated into personality-trait ratings and into 
job performance ratings, by five psycholo- 
gists, with a high degree of reliability (Ad- 
justment to Others .91, Job Effectiveness .79, 
Promotability .93, for example). 

2. The Personal History ratings thus ob- 
tained showed a high, significant relationship 
to the criterion, both on the personality traits 
and on Job Effectiveness, Adjustment to Co- 
workers, and Promotability. 


Contingency Coefficients (C) PH vs. Ratings 


Active 605 
Impulsive 655 
Dominant 654 
Stable 676 
Sociable 585 
Job-effectiveness 513 
Promotion Possibilities 697 
Adjustment to Others 614 


Through a misuse of chi-square methods 
(pointed out to them in the letter of Janu- 
ary 6, 1953), the authors report that these 
contingency coefficients, ranging from .51 to 
.70, are “not statistically significant.” Mr. 
Clark reported, in December 1952, that this 
happened because the 47 subjects were sub- 
divided into many cells, several of which con- 
tained less than 5 cases. Since an extremely 
large correction factor has to be applied—a 
procedure which is not acceptable, even tech- 
nically, to most statisticians—almost no co- 
efficient would appear significant, no matter 
how high. This is a technically possible, but 
logically meaningless, procedure. However, 
their own findings indicate that if proper chi- 
square divisions were applied to these data, 
both the Personal History and the standard 
tests would show a significant degree of rela- 
tionship with the criterion ratings. This, at 
least, is our considered opinion, and that of 
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several other statistically competent psycholo- 
gists (3, 4). 

3. The standard inventories showed a sig- 
nificant relationship to the criterion on five 
isolated personality traits, of about the same 
order as the PH-criterion relationship on 
these five traits (Active, Impulsive, Domi- 
nant, Stable, Sociable). However, these in- 
ventory trait scores show no power to predict 
Job Effectiveness, Adjustment to Co-workers, 
or Promotability. Indeed, it appears from 
the Clark-Owens report that no effort was 
made to’ attempt such a prediction from the 
inventories, although the criterion was avail- 
able. 

4. Thus, on the crucial criteria for de- 
termining the efficacy, as well as the validity, 
of any assessment method (5)—Adjustment 
to Co-workers, Job Effectiveness, and Pro- 
motability—the Clark-Owens data show that 
the Personal History method was significantly 
effective. Since the authors report no at- 
tempt to measure the predictive power of the 
standard inventories against these criteria, 
the “efficacy” of those inventories for predict- 
ing job performance remains wholly untested 
and unproven. Indeed, since the PH meas- 
ured the individual traits about as well as the 
inventories, and additionally measured job 
performance, it would seem that the inven- 
tories are not needed, in this setting. This is 
contradictory, of course, to the statements 
Clark and Owens made about “efficacy.” 

5. Clark and Owens’ remark about “dubi- 
ous or negative” findings on other validation 
studies of projective techniques requires refer- 
ence to numerous studies which have demon- 
strated positive validity for these methods. 
Naiveté, or errors of logic, in research design 
and in the use of statistical methods, have 
frequently resulted in “dubious” findings. 
However, properly designed research has re- 
peatedly shown that projective techniques, 
among them the Personal History method, 
can be valid predictors of overt, daily be- 
havior in the work world, as well as in the 
clinic (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16). 


Received June 4, 1954. 
Published out-of-turn by the editor. 
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A Reply to Drs. Peck—Stephenson 


William A. Owens, Jr. 
lowa State College 


Drs. Peck and Stephenson, as might have 
been anticipated from their obvious interest, 
have seen fit to make some interesting and 
ingenious comments upon the Clark-Owens 
study (1) of the Worthington Personal His- 
tory Blank (PH). However, since their com- 
ments purport to be “a correction” they 
should be examined in order. 

1. Peck and Stephenson say the criterion 
employed, that of associates’ ratings, is un- 
reliable, although they state that it “proved 
significantly related to both the PH ratings 
and the standard inventories.” Actually, 
these relationships were not statistically sig- 
nificant, although this “unreliable criterion” 
was found to be more closely related to stand- 
ard inventory results than to PH results (on 
the only five traits presumably measured by 
both) five times out of five. In this regard, 
at least, it was quite consistent. 

2. Peck and Stephenson seem to feel that 
the five traits common to PH and the avail- 
able standard inventories were not enough to 
constitute any real evidence as to the validity 
of PH. It was, of course, only in the case of 
these five traits that PH could really be 
evaluated, since low criterion relationships 
could well be attributed to low criterion re- 
liability or validity unless they were differ- 
entially low. They also state that the PH 
measures the interaction of traits (we pre- 
sume clinically, since no quantitative evidence 
is quoted), and that it, therefore, goes beyond 
the scope of standard inventories in a global 
direction. However, the obtained Clark- 
Owens estimate of the relationship between 
PH results and criterion ratings is lowest in 
the case of “job effectiveness’—a complex 
characteristic—-and about average for “pro- 
motion possibilities’ and “adjustment to 
others.” It would thus seem that the PH 
does not yield better global than simple esti- 
mates, in this sample. 

3. Peck and Stephenson accuse Clark and 
Owens of a “peculiar and persistent error in 


using the Chi-square method,” in spite of 
their earlier advice to us. Let us examine 
their arguments. (a) They say that we di- 
vided our 47 cases among too many cells, 
“several of which contained less than 5 cases.” 
However, the theoretical consideration relates 
to expected frequencies, not to observed, and 
even so, the number 5 is relatively arbitrary 
(5). (b) They imply that some enormous 
correction for continuity should have been 
made and was ignored, whereas Cochran 
(2) states that “Tables with more than 1 
degree of freedom and some expectations 
greater than 5—should—use y’ without cor- 
rection for continuity.” (c) Finally, they con- 
clude that, in their considered opinion, both 
the PH and standard inventory results would 
be significantly related to the criterion in, 
say, a 2 X 2 table. How this could happen is 
a bit hard to understand, since Guilford (3) 
states, “There is probably nothing to be 
gained by applying Yates’s correction when 
there is more than 1 degree of freedom.” 
And again, still quoting Guilford, “The effect 
of the correction is to reduce the size of y’.” 
Thus, Aad Clark-Owens followed the pro- 
cedure suggested by our critics, the effect 
would have been to remove the obtained y? 
values still further from significance. 

4. Peck and Stephenson say, quite cor- 
rectly, that contingency coefficients cannot be 
compared without making a class-index cor- 
rection. They also say, even more correctly, 
that “this is a relatively minor point.” Ac- 
tually, making this correction would do prac- 
tically nothing to the relative magnitudes of 
PH vs. test validities. The test coefficients 
would tend to receive larger corrections, since 
they are initially larger; and the PH would 
tend to receive larger corrections because the 
number of cells is somewhat smaller. If Peck 
and Stephenson had bothered to compute it, 
they could have observed that the differential 
shifts could not have exceeded .01 or .02. 
This, of course, would not remotely approach 
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changing the direction of a single difference 
—and direction is all that is involved in the 
randomization test. 

5. In their second section, purportedly 
dealing with “Correct Conclusions from the 
Data” Peck and Stephenson become very 
seriously, if not willfully, confused. They ap- 
pear to mistake an omission for a negative 
result saying, “these inventory trait scores 
show no power to predict Job-Effectiveness, 
Adjustment to Co-workers, or Promotability.” 
The reason they do not is that, in an attempt 
to be fair to the PH method, Clark-Owens 
did not report them. Actually, three of our 
judges subsequently did considerably better 
in predicting these three characteristics from 
the objective test results than from the PH. 
However, they were more familiar with the 
former, and the data may have been slightly 
contaminated. In any case, it was Clark- 
Owens’ stated purpose to evaluate PH vs. 
objective tests—not PH vs. objective tests 
plus an imponderable interpreter of them. 
Peck and Stephenson surely realize that the 
tests do not yield scores on these three char- 
acteristics. 


Owens, Jr. 


6. Finally, Clark-Owens’ critics take them 
to task for a comment about “the pattern of 
dubious or negative results found in valida- 
tional studies of other projective techniques.” 
An answer to them requires only a reference 
to Schofield (4), who summarizes all validity 
studies reported in 1949, 1950, and 1951, 
and indicates that two-thirds to three-fourths 
of them yielded negative results. 

All-in-all, Clark-Owens must firmly reject 
the alleged corrections of Peck-Stephenson, 
although fully granting the limitations of 
their study as originally set forth. 


Received July 20, 1954. 
Published out-of-turn by the editor. 
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Applied Psychology in Action 


GATB in Foreign Countries 


Beatrice J. Dvorak 


Testing Branch, U. S. Employment Service, Washington 25, D. C 


The USES General Aptitude Test Battery 
has been translated into a number of foreign 
languages, and research is being conducted in 
these foreign countries to adapt and stand- 
ardize it for use on populations in those coun- 
tries. Permission has been granted by the 
U. S. Employment Service to the following 
organizations and individuals to use the 
GATB in such research. While information 
is not available regarding the status of all of 
these projects, it is known that the French, 
Japanese, Portuguese, and Spanish editions 
have already been published. 


Argentina 


Carlos A. Pourteau Agote 
Universidad de Buenos Aires 
Laboratorio Psicotecnico 
Republica, Argentina 


Australia 
H. A. Bland 
Department of Labour and National Service 
Melbourne, Australia 


Belgium 
R. Buyse 
University of Lourain 
Tournai, Belgium 


Jean Herickx 
Centre d’Orientation 
Bruxelles, Belgium 


M. Dewals 

Psychotechnicien de la Société Nationale 
des Chemins de Fer Vicinaux 

Bruxelles, Belgium 

Capitaine Commandant Hourman 

Chief du Centre d’Orientation 

Ministére de la Defense Nationale 

Bruxelles, Belgium 

F. Vandenborre 


Ministére de ]’Instruction Publique 
Bruxelles, Belgium 


Brazil 
Jacy Magalhaes 
Divisao de Organizacao do Trabolho 
Rio de Janeiro, Brazil 


Livraria Oscar Nicolai 
Caixa Postal 246 
Brazil 
S. J. Schwarzstein 
Director do Servico de Colocacao e Informacao 

Profissional 
Sao Paulo, Brazil 
Secretaria do Trabalho 
Servico de Colocacao e 

Informacao Profissional 
Sao Paulo, Brazil 

Canada 
Morgan D. Parmenter 
Director, The Guidance Centre 
University of Toronto 
Toronto 5, Canada 
China 
Ministry of Social Affairs 
Shanghai, China 
Denmark 

Poul Bahnsen 
Director, Psykotekniske Institut 
Copenhagen K.—Denmark 
Paul Vidriksen 
Arbejdsdi Rektoratet 
Kopenhavn, Denmark 


England 

M. Desai 

Psychological Department, London County 
Council 

London, England 

H. J. Eysenck and J. Tizard 

The Maudsley Hospital 

London S. E. 5, England 

C. B. Frisby 

Director, National Institute of Industrial Psy- 
chology 

London W. C. 2, England 

Roland Harper and D. R. Martin 

The University of Leeds 

Leeds 2, England 

B. W. Richards 

St. Laurence’s Hospital 

Caterham, Surrey, England 

Constance M. Mathieson 


East Anglian Regional Hospital Board 
Norwich, Norfolk, England 
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Alec Rodger 
Birkbeck College, University of London 
London, England 


India 


Vocational Guidance Bureau 
Bombay, India 


Italy 
Ing. Vincenzo Flagiello 
Societa per ]'Industria e ]’Elettricita 
Centro Istruzione Professionale 
Viale Benedetto Brin 
Terni, Italy 


Agostino Gemelli 
Director, Laboratorio di Psicologia veenteidis 
Milano, Ttaly 


Guido Majaron 
Viale Arnaldo Fusinato 2F 
Vicenza, Italy 


Consiglio Nazionale delle Ricerche 
Instituto Nazionale di Psicologia 
Rome, Italy 


New Zealand 


Auckland University College 
Auckland C. 1, New Zealand 


W. J. H. Clark 
Vocational Guidance Centre 
Auckland, New Zealand 


Peru 
Santiago Salinas 
Ministerio de Trabajo y Asuntos Indigenas 
Lima, Peru 


Philippines 
Antonio V. Roxas 
Escolta, Manila, Philippines 


Applied Psychology in Action 


Scotland 


P. S. Boyd and W. M. Miller 
Department of Mental Health 
Aberdeen, Scotland 


South Africa 
D. J. Du Plessis 
Department of Labor 
Johannesburg, Union of South Africa 


C. P. J. Erasmus 
University of the Orange Free State 
Bloemfontein, South Africa 


Department of Psychology 
University of Stellenbosch 
Stellenbosch, South Africa 


Evryl Fisher 
Church Street 
Cape Town, South Africa 


Sweden 
Torsten Husen 
Cintrala Varnpliktsbyran 
Personalprovningsdetaljen 
Stockholm 10, Sweden 


Switzerland 
J. F. Herzog 
Office d’Orientation Professionelle 
Neuchatel, Switzerland 


Ph. H. Muller 
Université de Neuchatel 
Neuchatel, Switzerland 


Turkey 
Faruk Kardam 
Director-General of the Turkish Employment 
Service 
Ankara, Turkey 





Book Reviews 


Tuckman, J. and Lorge, I. Retirement and 
the industrial worker: prospect and reality. 
New York: Bureau of Publications, Teach- 
ers College, Columbia University, 1953. 
Pp. xvi+ 105. $2.75. 


This book reports the results of a survey 
undertaken at the request of the New York 
Cloak Joint Board of the International Ladies’ 
Garment Workers’ Union. The study in- 
vestigated, by means of personal interviews, 
the attitudes toward retirement of three dif- 
ferent groups of persons. These groups con- 
sisted of (1) 204 men and women still on 
their jobs, (2) 216 men and women who had 
submitted applications for retirement but who 
were still working, and (3) 240 retired per- 
sons. All interviewees were or had been 
members of the above named union, and all 
had earned their livelihood in the needle 
trades. The schedules used in the inter- 
views were designed to obtain information 
relative to a wide variety of employment-re- 
tirement questions which fall generally under 
six main headings. These headings form the 
outline for the book and include: Retirement 
Attitudes, Health, Pressure Effect of Aging 
on Work Performance, The Worker’s Prepara- 
tion for Retirement, Effect of Retirement on 
the Family, and Factors Related to Retire- 
ment Attitudes. 

Results are, of course, reported in terms 
of percentages of respondents falling in each 
of various response categories. Statistical sig- 
nificance is tested by means of chi square 
tests. It is to the authors’ credit that they 
stay close to their facts and figures. They 
do not commit the error (so common in the 
literature about the problems of older em- 
ployees) of launching into long opinionated 
discussions. Nor do they attempt to derive 
generalizations from their data which are not 
warranted by the narrowness of the popula- 
tion studied. 

The last pages of the book consist of an 
excellent summary of the study and a short 
section devoted to conclusions and recom- 
mendations. It is this last section that will 
prove most useful to other persons doing re- 
search on older employee utilization. For it 
is here that one finds a wealth of hypotheses 
that need to be tested on a broader basis. 
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The primary barriers to the utilization and/ 
or happy retirement of older persons are 
clearly outlined and questions are formulated 
which could well form the framework for 
other research programs designed to find 
methods of overcoming these barriers. 

Presentation of the survey results could 
have been made much clearer and more easily 
understood. As it is, the reader is confronted 
with table after table of percentages which, 
although clearly titled and well organized, 
finally contribute to an overwhelming sense 
of boredom. Simple bar diagrams, pie charts, 
frequency polygons, and histograms could 
have been used to great advantage to facili- 
tate quick and accurate interpretation of the 
results presented. 

This book represents one of the most ex- 
tensive researches into the attitudes and prob- 
lems of working, retiring, and retired work- 
ers yet performed. As such, it is a “must” 
for persons engaged in the study of employ- 
ment and retirement problems of older em- 
ployees. In addition to the wealth of data 
presented, it is a rich source of research hy- 
potheses, and points up the problems which 
must still be solved by researchers in this area. 

Marvin D. Dunnette 


Industrial Relations Center 
University of Minnesota 


Berdie, R. F. (Editor). Roles and relation- 
ships in counseling. Minnesota Studies in 
Student Personnel Work, No. 3. Min- 
neapolis: University of Minnesota Press, 
1953. Pp. 37. $1.25. 


This publication consists of three papers 
presented at the Second Annual Conference 
of Administrators of College and University 
Counseling Programs held at the University 
of Illinois in 1951. 

In the first paper, John Gustad discusses 
the definition of counseling. Clinical psy- 
chology and counseling can be considered to 
be essentially “one general kind of endeavor 
but with differing emphasis.” Both include 
psychotherapy “where appropriate to the 
client and within the province of the prac- 
titioner.” Teaching and counseling are dif- 
ferentiated largely in terms of different train- 
ing and experience. His definition of coun- 
seling stresses the role of learning and the 
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requirement of professional competence. The 


analysis of the problem and review of the. 


literature are helpful. 

In the next paper Ralph Berdie describes 
the tactics and techniques developed in the 
Counseling Bureau at Minnesota to deal with 
problems of human and institutional relation- 
ships (which he terms public relations “in a 
very limited sense”). His base point is ef- 
fective service. Beyond this he describes a 
number of gambits to improve client rela- 
tionships and outlines an equally active pro- 
gram to promote intra- and extra-institutional 
staff relationships. Counseling administra- 
tors will find a number of suggestions for per- 
formance of a function rarely discussed in 
print. Unfriendly voices will perhaps find 
evidence to reinforce their suspicions of “em- 
pire-building.” 

Harold Pepinsky’s concluding paper argues 
the thesis that the counseling psychologist 
“can help to build a culture in which the in- 
dividual members are able to communicate 
with each other, to respond positively to each 
other, and to work together toward common 
group objectives,’ and provides a rationale 
for the use of group procedures in pursuit of 
this aim. Such activities appear to be geared 
to reaching a much larger proportion of the 
student body in the interests of community- 
wide mental health. 

On a limited sector, the college or univer- 
sity campus, these papers exemplify a phe- 
nomenon of our times—the development (in- 
cluding growing pains) of new professional 
groups eager to provide experiences aimed at 
helping man to cope with the “human pre- 
dicament.” It is understandable that coun- 
seling psychologists are zealous and ambiti- 
ous; the need is great. 


Arthur H. Brayfield 
Kansas State College 


Bross, Irwin D. J. Design for decision. New 
York: Macmillan, 1953. Pp. viii + 276. 
$4.25. 


Every so often a book arrives for me to 
review that I find interesting and exciting. 
Bross’s Design for decision is one of the few 
that falls into this category. It was read 
hastily once with enthusiasm and unflagging 
interest and almost without interruption of 
any sort. I could hardly wait to finish one 
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chapter so that I could move on to the next. 


, During the weeks that followed the first read- 


ing, I picked up the book many times to read 
various sections at a more leisurely pace and 
my enthusiasm for it has not diminished to 
any noticeable extent. 

It may be granted that the book intro- 
duces nothing that is not the common knowl- 
edge of all modern statisticians. But it does 
say what it has to say in a manner that few, 
if any, other modern statisticians have been 
able to say it in. Bross can write and he 
writes very well indeed. 

Do not let that word “statisticians” that I 
used above mislead you. This is not a book 
about statistics in the sense in which you 
may interpret that word. It is not, for ex- 
ample, a collection of formulas, illustrative 
calculations, mathematical derivations and 
proofs. As the author states, no mathematics 
is required for reading it other than high 
school algebra. As a matter of fact, even if 
you have forgotten your high school algebra, 
you'll still get along with the text pretty well. 
Rather, this is a book about decision making 
or, more precisely, about statistical decision. 

What is statistical decision? You may get 
some indication of what statistical decision in- 
volves from the following listing of chapter 
headings: history of decision, nature of de- 
cision, prediction, probability, values, rules 
for action, operating a decision-maker, se- 
quential decision, data, models, sampling, 
measurement, statistical inference, statistical 
techniques, design for decision. 

My answer, although admittedly inade- 
quate, is that statistical decision is a method 
for making decisions that has its origins in 
a variety of specialized fields. I might even 
go so far as to identify statistical decision 
with scientific method, though Bross may not 
agree with this viewpoint. Anyway, the best 
answer as to what statistical decision is can be 
obtained by reading Bross’s book for yourself. 

I should add that there is something in 
this book for everyone. If you have no sta- 
tistical training, Design for decision will tell 
you how a modern decision-maker operates— 
without overwhelming you with mathematical 
details. If you have some experience in ap- 


plied statistics, then, as Bross points out, you 
may find that this book “provides a vantage 
point from which it is possible to see all of the 
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scattered techniques in their proper perspec- 
tive.” 

“Some readers may be intrigued by the 
ideas of Statistical Decision because they rep- 
resent a new advance toward the solution of a 
basic human problem. The principles have a 
wide scope; they apply to the choice of a 
foreign policy or to the private decisions that 
we all must make. They are, if you like, 
philosophical principles, a way of looking at 
the world in which we live, a guide to action 
in that world.” 

If the above paragraph from the introduc- 
tory chapter of Bross’s book doesn’t whet your 
appetite and stir you to march out to your 
nearest library or bookseller for a copy of 
Design for decision, then you are a lost cause 
and no additional words of praise of mine for 
this book are going to help. 

Allen L. Edwards 


The University of Washington 


Remmers, H. H. /ntroduction to opinion and 
attitude measurement. New York: Harper 
& Bros., 1954. Pp. 437. $5.00. 


Prepared as a college textbook, this volume 
by Dr. H. H. Remmers, professor of psychol- 


ogy and director of the Division of Educa- 
tional Reference at Purdue University, offers 
a panoramic view of the field of opinion and 
attitude measurement, with emphasis divided 
between method and application. 

“The realization is rapidly growiny;,’ the 
author says, “that attitudes, the way indi- 
viduals and groups feel about the various as- 
pects of their world, are probably more de- 
terminative of behavior than mere cognitive 
understanding of this world.” 

Part I is devoted to a discussion of tech- 
niques, including sampling and_ statistical 
theory, scaling, single question evaluation, 
the “summated questionnaire,” and some of 
the “less direct measures of attitudes’—pro- 
jective methods, sociometric approaches, rat- 
ing scales, and the concept of empathy. 

In Part II, Dr. Remmers describes many 
of the varied uses to which attitude and opin- 
ion measurements have been put in business, 
industry, the government, the study of com- 
munity interrelations, and education. 

The book is fairly comprehensive, succinct, 
and—in the main—a readable presentation. 
Appended to each of its dozen chapters are a 


brief critical summary, a list of questions, 
and a bibliography. 

The chapter on scaling techniques contains 
an able exposition of the Thurstone and 
Likert contributions, moves on to scale analy- 
sis as developed by Guttman, and deals in 
some detail with “the Cornell technique,” be- 
cause, says Dr. Remmers, it appears to be the 
one “most likely to be feasible in the greatest 
variety of situations.” 

His description of personality, interest, and 
problem inventories is of equal merit. He 
offers a quite lengthy report on the procedures 
that were followed in developing the Science 
Research Associates’ Youth Inventory, under 
auspices of the Purdue Opinion Panel, with 
which the author is identified. 

Dr. Remmers treats extensively of the ap- 
plications of attitude and opinion measure- 
ments by educators. He reviews also the 
utilization of similar methods by the busi- 
nessman to improve his advertising programs 
and his products; by industry in the study of 
employee attitudes, plant morale, absentee- 
ism, and workers’ opinions of members of 
minority groups; by social researchers in the 
analysis of intergroup and interpersonal re- 
lationships. 

Unhappily, the book appears to lack fresh- 
ness. Some of the material is obviously 
“dated”; one gains the impression that the 
author, except in a few instances, stopped 
collecting data along about 1947 or 1948, 
though much that is worth while has appeared 
in the literature since then. To illustrate 
Census Bureau sampling, he describes what 
was done in the 1940 census; to describe the 
government’s uses of attitude and opinion 
studies, he dwells on World War II opera- 
tions; to indicate the scope and nature of the 
Survey of Consumer Finances, he discusses 
what was done in the first year, 1946. With 
a mild apology for its absence, the author 
omits any material on how the television in- 
dustry has put social science research to work. 

The implication conveyed by the word “In- 
troduction” in the title, that this is a textbook 
for beginners, may be somewhat misleading; 
it quickly becomes apparent that the college 
student will find himself in deep water unless 
he has been forearmed with preliminary work 
in statistics and psychology. 

Notwithstanding, the volume is a scholarly 
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and well-planned treatise. In writing it, Dr. 
Remmers has made a substantial contribu- 
tion toward effecting the kind of “popular 
understanding of the importance and impli- 
cations” of the findings of the social scien- 
tists which he, at the outset, urges. The 
book is one of Harper’s Psychological Series, 
of which Gardner Murphy is editor. 
Sidney S. Goldish 


Minneapolis Star and Tribune 


Sherif, Muzafer and Wilson, M. O. (eds.). 
Group relations at the crossroads. New 
York: Harper, 1953. Pp. viii + 379. $4.00. 


Like the preceding Social Psychology at 
the Crossroads, this volume is a collection of 
papers emphasizing social-psychological con- 
cepts and explanations, prepared for a con- 
ference at the University of Oklahoma (April, 
1952). 

Sherif begins with an excellent summary 
introduction. Next comes J. P. Scott’s “Im- 
plications of Infrahuman Social Behavior for 
Problems of Human Relations.” This is one 
of the few fairly extensive reviews of the 
literature in the book. Scott uses the con- 
cept of levels of organization to describe 
phylogenetic differences in social behavior. 

In considering “Psychological Traits and 
Group Relations,” Anne Anastasi traces in 
detail the changes in approach in the area 
from a quest for a racial hierarchy to a more 
sophisticated multiple trait approach empha- 
sizing the use of analysis of variance and fac- 
tor analysis in which interactions between vari- 
ous traits and groups are expected to exist. 

Anselm Strauss’ “Concepts, Communica- 
tion and Groups” discusses the primary im- 
portance of language in the development of 
human social behavior. 

J. J. Gibson’s “Social Perception and Psy- 
chology of Perceptual Learning” is an out- 
line of the process of perceptual learning in 
terms of generalization and differentiation. 

Gardner Murphy’s “Knowns and Un- 
knowns in the Dynamics of Social Percep- 
tion” considers the importance of differential 
group membership on differential perception, 
the lines of cleavage between groups, and 
their significance. 

“Development of the Small-Group Re- 
search Movement” by R. E. L. Faris covers 
mainly sociological studies in the area. 
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Herbert Blumer’s “Psychological Impact of 
the Human Group” is a reiteration of the 
need to study both the group situation and 
the individual in developing an adequate 
theory of human interaction. An attempt is 
made to spell out some significant elements 
of group behavior. 

Sherif’s paper on reference groups reiter- 
ates the importance of reference groups as 
distinct from membership groups in under- 
standing social behavior. 

Leon Festinger’s “An Analysis of Com- 
pliant Behavior” is a discussion of the em- 
pirical validity of two hypotheses: (1) pub- 
lic compliance without private acceptance will 
occur if the person in question remains com- 
pliant and in the group to avoid punishment; 
(2) public compliance with private accept- 
ance will occur where it is satisfying to re- 
main with those influencing the person. 

Launor Carter’s “Leadership and Small 
Group Behavior” is primarily a summary of 
his experimental studies on the behavior of 
leaders, the generality of leadership, and the 
effects of the group on leader behavior. 

The need to consider “social distance” rela- 
tively, and to differentiate it from prejudice, 
is the theme of Mozell Hill’s paper. 

Nelson Foote and Clyde Hart’s “Public 
Opinion and Culture Behavior” point to: 
(1) the dangers of depending only on poll 
answers to gauge public opinion; (2) the 
possibilities of analyzing public behavior in 
order to assess public opinion. 

Helen Jennings concludes with a survey 
combining conclusions from sociometric stud- 
ies with, as yet, unpublished sociodramatic ex- 
amples to point out their significance for un- 
derstanding personality and group formation. 

Despite the heterogeneity of aims and 
methods of each of the papers, the reviewer 
will hazard presenting some overall impres- 
sions of the book. 

1. While such general psychology topics as 
perceptual development have been included, 
contributions from many of the largest re- 
search programs on group relations such as 
the Ohio State Leadership Studies and the 
Michigan Survey Research Center a’ ex- 
cluded. Anthropology is also absent 

2. Some of the papers could have been 
more of a contribution had they more exten- 
sively surveyed the literature. Yet, in gen- 
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eral, most papers tended to maintain a high 
standard of excellence. 

3. Social psychologists seem to be trying 
hard to adopt sociological concepts and to 
integrate their work via “interdisciplinary” 
research with the other social sciences. It 
may be both more parsimonious and profit- 
able for social psychologists to integrate their 
research with the gencral psychology of 
learning, perception and motivation. This 
does not mean that rejecting many sociologi- 
cal concepts will lead to ignoring the nature 
of the stimulating situation while studying 
social behavior. Rather, the situation will 
be described in terms which will lend them- 
selves more readily to integration with psycho- 
logical concepts describing the other equally 
important determinants of social, and all be- 
havior, i.e., the behavioral history, motiva- 
tion and biological level of the behaving or- 
ganisms. 

Bernard M. Bass 


Louisiana State University 


Schlotter, Bertha E. and Svendsen, Margaret. 
An experiment in recreation with the men- 
tally retarded. (Rev. ed.) State of Iili- 

Pub- 


nois, Department of Public Welfare. 
lished by National Mental Health Funds, 


1951. Pp. 142. Gratis. 


This book is a re-issue of the volume pub- 
lished in 1932. Additions are a new introduc- 
tion by the director of the Department of 
Public Welfare, and a thirteen page preface 
by Bertha E. Schlotter which provides an 
overview of the continuing effects of the rec- 
reational program begun in 1929. Requests 
from other institutions for copies of the 
earlier publication led to this re-issue. 

The Illinois Institute for Juvenile Re- 
search, in a survey of the recreational pro- 
gram at the Lincoln State School and Colony 
in 1929, reported institutional overcrowding, 
inadequate facilities and staff, poor use of 
facilities, and overemphasis on maintaining 
quiet and order. Recreational activities pro- 
vided for active participation by only 100 of 
the 2,600 patients then under care. The es- 
tablishment of a department of recreation and 
a one-year experiment with a recreation pro- 
gram on an institution-wide basis followed. 
This book discusses staff qualifications, in- 


379 


service training programs, grouping of pa- 
tients, and equipment and space problems. 
Specific lists of equipment, musical selections, 
books, and activities are included, with com- 
ments concerning their use and modification. 
About half of the book is devoted to “socio- 
psychological analysis of play activities.” 
This section classifies activities in several 
ways: alphabetically, with minimum MA 
indicated; grouped for several MA levels; 
according to the degree of motor activity; 
according to need for equipment; and ac- 
cording to type of social organization and 
participation. 

There are some important values of the 
book: the inclusion of lists of source books 
for games, songs, activities, and dances is of 
special interest to the recreational worker; 
the beginning worker will benefit by the 
vicarious experience made available. There 
are useful suggestions for modifying activi- 
ties to suit special needs, and “leads” as to 
the handling of difficult patients. There is a 
real exemplification of the wide practical im- 
plications of the concept of individual differ- 
ences in work with defectives. 

From the viewpoint of this reviewer, it is 
unfortunate that the author attempted in the 
preface to defend the program in terms of 
psychological “principles” which are often in- 
consistent and contradictory, and sometimes 
not principles at all. The psychologically un- 
sophisticated reader might be over-impressed 
by the comment, “This belies the belief that 
punishment, drill, and rewards are justified 
in the treatment of mental defectives” (p. 
12). Comments concerning the level of per- 
formance attained would also be misleading 
to the neophyte in the field of mental de- 
ficiency: i., “In their dancing they show 
skill, variety, imagination, and spontaneity” 
(p. 17). Without at least a reminder to the 
reader of relative standards of expectation, 
such statements are potentially dangerous. 

A recreation worker interested in the men- 
tally deficient should study this book, but 
should maintain a cautious attitude toward 
the generalizations while embracing the spe- 
cific helpful suggestions and making full use 
of the source material. 

Harriet E. Blodgett 


Institute of Child Welfare, 
University of Minnesota 
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TWO NEW MULTI-PURPOSE TOOLS 
FOR PSYCHOLOGICAL RESEARCH 


DORMIPHONE 


All-in-One Phonograph 
and Magnetic Recorder 


For use in the laboratory, the classroom, or at 
home, this completely new departure in port- 
able audio equipment adds a new dimension 
to modern teaching and learning techniques. 
Plays all standard size records, all speeds with 
exceptional high fidelity. Also records over 
four minutes of music, speech, or broadcast 
materials on each side of a special 12” unbreak- 
able, pre-grooved Magnetic Disc that can be 
instantly “erased,” edited or used again and 
again. 


Superb, noiseless, British-made Collaro changer 

and repeater automatically plays back through Reports of experiments by psychologists, 
a leulpocher or special plug-in-attachments _psychoanalysts, psychotherapists, and others 
for private listening, awake or asleep. Built-in demonstrate Dormiphone’s unique value in 
timer can be pre-set for as many as 48 fifteen- | many areas of educational research. Descrip- 
minute playing periods during day or night. _ tive literature available FREE upon request. 


Electro-Sonic MEMORY TRAINER 


This versatile precision-made instrument 
is the world’s smallest multi-pur tool 
for group or self-study in fields of memory 
and concentration training and other as- 
pects of psychological research. Records, 
instantly plays back, and automatically 
repeats up to 3 minutes of any material 
through a built-in speaker. No com- 
plicated mechanisms, no “reels” of tape to 
wind and unwind. Occupies only one 
square foot of space and weighs only 6 lbs. 
—making Memory Trainer completely 
portable. 


RECORDING CARTRIDGE 

Easily removable. Can Write for FREE Litera- 
be stored or “erased” in- ture on Dormiphone and 
stantly and re-used re- Memory Trainer, 
peatedly. 


MODERNOPHONE, Inc. 


163-10 RCA Building, New York 20, New York 





